nhslogo CS4132 Data Analytics

"Indie Games: The Underdogs of Gaming" by Ayden Ang

Table of Content (with relevant hyperlinks to sections)

Motivation and Background

The video game industry has been growing for the past couple of years, with many new games being developed and released at all times. Many of these games are developed by big game studios, which have hundreds of people and have massive amounts of funding, allowing them to dedicate a lot of financial resources and manpower to developing their games. These games are what people call Triple-A games, or AAA games, games that are distributed by large companies that are well-known, such as Sony and Microsoft.

However, there has been some negative stigma that has started to surround AAA games. Due to a variety or reasons, such as microtransactions, incomplete buggy released games and the lack of risk-taking, many of the AAA games that have been recently released have been widely regarded as being poorer in quality and being more lackluster to play. This is largely due to the focus of these AAA game studios becoming more on profits than creating a truly enjoyable game, putting quantity over quality.

Since AAA games are on the decline, this leaves a gap in the video game industry to be filled. This is where indie games come in. Indie games are games that are developed by small groups of individuals without the technical and financial resources that a AAA game studio has. Due to this, the scope of indie games are usually much more limited than that of AAA games, and it can be hard for indie game developers to develop a game without much support from a large team. Despite this, however, many indie games are extremely high in quality and can be compared to that of AAA games, perhaps even higher in quality than AAA games. indie game developers also have the freedom to create whatever game idea they want, something that AAA game studios do not have, resulting in a greater variety of indie games.

The popularity of indie games is quickly rising in popularity, not only among consumers, but also among game developers. Gamers are starting to see the appeal of indie games, increasing the demand for them. Many of the games that are sold on Steam, which is one of the most popular video game distribution services, are indie games, showing that many gamers want to purchase and play indie games. Indie games are also encouraging more individuals to begin their journey in game development, with more tools and services to help them develop indie games becoming available recently. One of these services is itch.io, which is a website where anybody can host and sell their own indie games independently. While it is not as popular as Steam is as a video game distribution service, it is focused completely on indie games, allowing indie developers to publish and sell their own passion projects online.

Therefore, I would like to analyse the rise in popularity in indie games for both consumers and game developers, as well as how they compare to AAA games.

Summary of Research Questions & Results

  1. How popular are indie games compared to AAA games? While it is known that indie games are on the rise, it is still unclear if they have they grown to the point that they can overthrow AAA games. It is also unclear when is the definitive point of time when the upsurge of indie gaming truly began. Therefore, I want to compare the popularity of indie games to that of AAA games at different time frames, to see the trends and visualise how indie games are catching up to AAA games.
  1. What are the major differences between indie games and AAA games? Indie games and AAA games are majorly different, both in development and in gameplay. While AAA games often have a lot of budget and manpower allocated to them, indie games do not have that luxury. As a result, while AAA games can have a larger scope and have more content, indie games have to find other ways to appeal to consumers. Therefore, I want to analyse what are the major differences in the trends of indie games and AAA games.
  1. What factors contribute to the success of an indie game? While indie gaming as whole has been on the rise, not all indie games are equally popular. There are some that have become as famous as AAA games are, while some have more of a small playerbase. Therefore, I want to analyse what factors affect the popularity of indie games and what allows some indie games to become successful. Perhaps there would be differing trends in indie games when grouped in terms of popularity.
  1. What are factors that indie game developers have to consider when developing an indie game? Due to indie game developers not having a lot of manpower and financial resources, they often have limited options when developing indie games and would not be able to develop games with very large scopes. This makes indie game development very tough, yet there are still many individuals or small groups of people that are able to successfully develop a finished product. Therefore, I want to find out how indie game developers make the developing of indie games more manageable for them to handle. Perhaps there there some tools and software that are more popular among developers for making indie game development easier, or there are types of games that are more popular to develop than others.

Dataset

  1. https://steamdb.info/stats/gameratings/?all This website contains the biggest list of steam games that I can scrap. In the HTML of this page, there are a total of 58410 Steam games that I could scrap. I used this to get the IDs, names and distribution of positive and negative reviews of each game.
  1. https://steamspy.com/api.php This is the API link of a website that lists various information about Steam games. I used this API to obtain the range of the possible number of owners for each game, the price of each game, and the average and median playtime of players for each game.
  1. https://store.steampowered.com/ This is the official site of Steam and contains all store pages of every Steam game. I used the store pages of each game to get various details, which are the developers, publishers, release date, languages, genres and tags of each respective game.
  1. https://steamcharts.com/ This website contains statistics of the concurrent players of each Steam game. I used this site to obtain the average concurrent players per month and the peak concurrent players per month for each game.
  1. https://itch.io/games/top-rated This is the official site of itch.io and lists the top-rated itch.io games. I used this site to obtain various information from the store pages of 13968 itch.io games.

Methodology

All relevant imports are listed here.

In [1]:

Data Acquisition

all_steam_games_info.csv contains all of the info collected for the 58410 Steam games.

  • id: The ID of the game.
  • name: The name of the game.
  • total_reviews: The total number of reviews of the game.
  • positive_reviews: The number of positive reviews of the game.
  • negative_reviews: The number of negative reviews of the game.
  • rating: The percentage of reviews for the game that are positive.
  • owners: The estimated number of owners of the game.
  • min_owners: The minimum estimated number of owners of the game.
  • max_owners: The maximum estimated number of owners of the game.
  • avg_playtime: The average playtime of players of the game in hours.
  • median_playtime: The median playtime of players of the game in hours.
  • price: The price of the game in dollars.
  • date: The release date of the game.
  • developers: The developers of the game. Stored in a list.
  • publishers: The publishers of the game. Stored in a list.
  • languages: The available languages that the game is in. Stored in a list.
  • genres: The genres of the game. Stored in a list.
  • tags: The tags of the game. Stored in a list.
In [2]:
Out[2]:
id name total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags
0 620 Portal 2 302395 298749 3646 98.79 15000000 10000000 20000000 16.633333 8.266667 9.99 18 Apr, 2011 ['Valve'] ['Valve'] ['English', 'French', 'German', 'Spanish - Spa... ['Action', 'Adventure'] ['Platformer', 'Puzzle', 'Dark Humor', 'First-...
1 1118200 People Playground 129491 128053 1438 98.89 3500000 2000000 5000000 30.866667 12.166667 9.99 23 Jul, 2019 ['mestiez'] ['Studio Minus'] ['English'] ['Action', 'Casual', 'Indie', 'Simulation'] ['Sandbox', 'Physics', 'Gore', 'Violent', 'Mod...
2 1794680 Vampire Survivors 114325 113067 1258 98.90 3500000 2000000 5000000 22.616667 14.700000 2.99 17 Dec, 2021 ['poncle'] ['poncle'] ['English'] ['Action', 'Casual', 'Indie', 'RPG', 'Early Ac... ['Action Roguelike', 'Pixel Graphics', 'Bullet...
3 1145360 Hades 194006 191332 2674 98.62 7500000 5000000 10000000 35.166667 18.866667 24.99 17 Sep, 2020 ['Supergiant Games'] ['Supergiant Games'] ['English', 'French', 'Italian', 'German', 'Sp... ['Action', 'Indie', 'RPG'] ['Action Roguelike', 'Indie', 'Roguelite', 'Ac...
4 413150 Stardew Valley 485586 476588 8998 98.15 15000000 10000000 20000000 66.883333 30.433333 14.99 26 Feb, 2016 ['ConcernedApe'] ['ConcernedApe'] ['English', 'German', 'Spanish - Spain', 'Japa... ['Indie', 'RPG', 'Simulation'] ['Farming Sim', 'Life Sim', 'Pixel Graphics', ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 Construction Machines 2014 311 36 275 11.58 35000 20000 50000 0.483333 0.483333 6.99 28 Mar, 2014 ['GameCask'] ['GameCask'] ['English', 'French', 'German', 'Polish'] ['Simulation'] ['Simulation', 'Building', 'Singleplayer']
58406 210490 Fray 50 1 49 2.00 10000 0 20000 0.000000 0.000000 0.00 19 Jun, 2012 ['Brain Candy'] ['Brain Candy'] ['English'] ['Action', 'Strategy', 'Indie'] ['Strategy', 'Action', 'Indie']
58407 257930 Race To Mars 261 20 241 7.66 10000 0 20000 0.000000 0.000000 0.00 7 Mar, 2014 ['INTERMARUM', 'ONE MORE LEVEL'] ['ONE MORE LEVEL'] ['English', 'Polish'] ['Indie', 'Simulation', 'Strategy', 'Early Acc... ['Strategy', 'Simulation', 'Indie', 'Space', '...
58408 1180320 三国杀 15824 2130 13694 13.46 10000 0 20000 0.000000 0.000000 NaN 17 Dec, 2021 ['杭州游卡网络技术有限公司'] ['杭州游卡网络技术有限公司'] ['English', 'Simplified Chinese'] ['Strategy'] ['Sexual Content', 'Psychological Horror', 'Nu...
58409 397760 Urban War Defense 79 2 77 2.53 10000 0 20000 20.700000 20.700000 5.99 7 Jul, 2017 ['Budgie Games'] ['Budgie Games'] ['English'] ['Action', 'Indie', 'Strategy', 'Early Access'] ['Action', 'Indie', 'Early Access', 'Strategy'...

58410 rows × 18 columns

all_steam_games_avg_players_per_month.csv contains all of the statistics of the average concurrent players per month for each Steam game. The ID of the game is in the leftmost column, while the rest of the columns are the average concurrent players from July 2012 to July 2022.

In [3]:
Out[3]:
id July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 ... October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022
0 620 4167.67 2480.58 1853.0 1220.53 1795.32 3003.45 2740.09 1761.29 1509.08 ... 1421.25 1854.42 2641.66 2975.97 2462.39 2012.90 1847.38 1668.29 1865.04 2337.15
1 1118200 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 4768.48 5097.49 5064.49 6104.98 6482.67 6415.66 6979.02 6762.41 6562.11 7054.70
2 1794680 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 4.13 10785.00 39031.65 27206.71 33328.82 22304.75 17195.98 12884.41
3 1145360 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 4921.48 5745.22 6371.04 7379.91 6356.76 4961.04 4608.01 3751.57 3941.76 6191.93
4 413150 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 22414.12 23206.18 24951.71 33621.58 28793.45 25851.21 26829.80 28036.57 31775.77 39076.48
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58406 210490 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58407 257930 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58408 1180320 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 339.39 215.27 196.50 245.59 253.09 266.06 260.53 281.44
58409 397760 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

58410 rows × 122 columns

all_steam_games_peak_players_per_month.csv contains all of the statistics of the peak concurrent players per month for each Steam game. The ID of the game is in the leftmost column, while the rest of the columns are the peak concurrent players from July 2012 to July 2022.

In [4]:
Out[4]:
id July 2012 August 2012 September 2012 October 2012 November 2012 December 2012 January 2013 February 2013 March 2013 ... October 2021 November 2021 December 2021 January 2022 February 2022 March 2022 April 2022 May 2022 June 2022 July 2022
0 620 8857.0 5024.0 4131.0 2573.0 8555.0 7471.0 6366.0 3687.0 3005.0 ... 2615.0 6216.0 5813.0 6139.0 4892.0 3686.0 3562.0 3104.0 4433.0 4520.0
1 1118200 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 12749.0 9432.0 9053.0 10596.0 13056.0 11519.0 13019.0 13774.0 10078.0 11054.0
2 1794680 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 12.0 50847.0 77061.0 58323.0 68805.0 48081.0 34545.0 21058.0
3 1145360 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 13666.0 12607.0 11055.0 12295.0 12076.0 10310.0 9267.0 6097.0 10020.0 14044.0
4 413150 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 35687.0 35441.0 39992.0 48151.0 44044.0 38568.0 40444.0 42786.0 49366.0 54288.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58406 210490 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58407 257930 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
58408 1180320 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 926.0 413.0 391.0 552.0 505.0 514.0 498.0 523.0
58409 397760 NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

58410 rows × 122 columns

itch.io_games_info.csv contains all of the info collected for the 13968 itch.io games.

  • Name: The name of the game.
  • Price: The price of the game.
  • Platforms: The platforms that the game supports. Stored in a list.
  • Rating: The rating of the game.
  • Authors: The developers of the game. Stored in a list.
  • Genre: The genres of the game. Stored in a list.
  • Tags: The tags of the game. Stored in a list.
  • Number of Reviews: The number of reviews of the game.
  • Made with: The tools and software used in the development of the game. Stored in a list.
  • Average session: The average playtime of players of the game.
  • Languages: The available languages that the game is in. Stored in a list.
  • Inputs: The inputs that the game supports. Stored in a list.
  • Accessibility: The accessibility options the game provides. Stored in a list.
In [5]:
Out[5]:
Name Price Platforms Rating Authors Genre Tags Number of Reviews Made with Average session Languages Inputs Accessibility
0 ​Our Life: Beginnings & Always $0.00 ['Windows', 'macOS', 'Linux', 'Android'] 5.0 ['GBPatch'] ['Visual Novel', 'Interactive Fiction'] ['amare', 'Comedy', 'Dating Sim', 'Gay', 'LGBT... 3054.0 NaN NaN NaN NaN NaN
1 HoloCure $0.00 ['Windows'] 5.0 ['Kay Yu'] ['Action'] ['Fangame', 'hololive', 'Pixel Art', 'Roguelit... 2923.0 ['GameMaker: Studio'] About a half-hour ['English', 'Japanese'] ['Keyboard'] NaN
2 Friday Night Funkin' $0.00 ['Windows', 'macOS', 'Linux', 'HTML5'] 5.0 ['ninjamuffin99', 'PhantomArcade'] ['Rhythm'] ['2D'] 9536.0 ['OpenFL', 'IndieCade', 'Haxe'] About an hour NaN ['Dance pad'] NaN
3 Adventures With Anxiety! $0.00 ['HTML5'] 5.0 ['Nicky Case!'] ['Visual Novel'] ['Comedy', 'Mental Health', 'Narrative'] 3158.0 NaN NaN NaN NaN NaN
4 Butterfly Soup $0.00 ['Windows', 'macOS', 'Linux'] 5.0 ['Brianna Lei'] ['Visual Novel', 'Interactive Fiction'] ['2D', 'Anime', 'Female Protagonist', 'LGBT', ... 3276.0 ["Ren'Py"] A few seconds ['Czech', 'Persian', 'Japanese', 'Korean', 'Po... NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
13963 casual carnival (prototype) $0.00 ['Windows', 'HTML5'] 5.0 ['olmewe'] ['Card Game', 'Puzzle'] ['2D', 'Arcade', 'Furry', 'pupy'] 7.0 ['Unity'] About a half-hour ['English'] ['Keyboard', 'Mouse'] NaN
13964 Cook'em Up $0.00 ['Windows', 'HTML5'] 5.0 ['Carbonara', 'Yanni', 'Aredhele', 'Rémi Cros'... ['Shooter'] NaN 7.0 ['Unity'] NaN NaN NaN NaN
13965 Super Funky Light Show - Full version $0.00 NaN 5.0 ['OwenSenior'] ['Shooter'] NaN 7.0 NaN NaN NaN NaN NaN
13966 A World in a Jar $0.00 ['Windows', 'macOS', 'Linux'] 5.0 ['Zeknir', 'Traincraft'] ['Puzzle'] ['Casual', 'Ludum Dare 38', 'Pixel Art', 'Sing... 7.0 NaN About a half-hour ['English'] ['Mouse', 'Touchscreen'] ['Interactive tutorial']
13967 TURDLE C64 $0.00 ['HTML5'] 5.0 ['Roysterini'] ['Puzzle'] ['Commodore 64', 'megastyle', 'Retro', 'silly'... 7.0 NaN NaN NaN NaN NaN

13968 rows × 13 columns

Data Cleaning

steam_df

The first dataset to be cleaned is steam_df.

Due to the developers, publishers, languages, genres and tags columns being collected in a list, some of the values in these columns are simply empty lists ("[]").

In [6]:
Out[6]:
id name total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags
241 900883 The Elder Scrolls IV: Oblivion® Game of the Ye... 37004 35362 1642 95.56 1500000 1000000 2000000 0.000000 0.000000 14.99 16 Jun, 2009 ['Bethesda Game Studios®'] ['Bethesda Softworks'] ['English'] [] ['RPG', 'Open World', 'Fantasy', 'Singleplayer...
491 245550 Free to Play 10969 10430 539 95.09 10000 0 20000 1.983333 1.383333 0.00 19 Mar, 2014 ['Valve'] ['Valve'] ['English', 'French', 'Italian', 'German', 'Sp... [] ['Free to Play', 'Documentary', 'eSports', 'Ac...
915 241930 Middle-earth™: Shadow of Mordor™ 75269 69431 5838 92.24 7500000 5000000 10000000 20.200000 14.750000 9.99 30 Sep, 2014 ['Monolith Productions'] ['Warner Bros. Interactive Entertainment', 'Wa... ['English', 'French', 'Italian', 'German', 'Sp... [] ['Open World', 'Action', 'Fantasy', 'Adventure...
1721 701380 El Tango de la Muerte 225 221 4 98.22 10000 0 20000 0.000000 0.000000 4.99 24 Apr, 2018 ['Hernán Smicht', 'YIRA::'] ['Hernán Smicht'] ['English', 'Spanish - Spain', 'Simplified Chi... [] ['Rhythm', 'Well-Written', 'Music', 'Comedy', ...
2120 242550 Rayman Legends 5920 5385 535 90.96 350000 200000 500000 11.783333 11.983333 29.99 29 Aug, 2013 [] ['Ubisoft'] ['English', 'French', 'Italian', 'German', 'Sp... ['Action', 'Adventure'] ['Platformer', 'Adventure', 'Action', 'Local C...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58165 252770 Vox 386 88 298 22.80 35000 20000 50000 1.833333 2.266667 0.00 12 Nov, 2013 [] [] ['English'] ['Action', 'Adventure', 'Indie', 'RPG', 'Early... ['RPG', 'Indie', 'Adventure', 'Action', 'Early...
58304 825430 Defiance 2050 - Beta 451 89 362 19.73 10000 0 20000 0.000000 0.000000 0.00 20 Apr, 2018 [] [] ['English'] [] []
58323 260510 World Basketball Tycoon 104 16 88 15.38 35000 20000 50000 0.000000 0.000000 2.99 18 Nov, 2013 [] ['Strategy First'] ['English', 'French', 'Italian', 'German', 'Sp... ['Simulation'] ['Simulation', 'Management', 'Basketball', 'Sp...
58357 218980 Patterns 154 23 131 14.94 10000 0 20000 0.000000 0.000000 0.00 NaN [] [] ['English'] ['Casual', 'Simulation', 'Strategy'] ['Casual', 'Simulation', 'Strategy', 'Sandbox'...
58404 1434500 MiniFarm 2020 25 0 25 0.00 10000 0 20000 0.000000 0.000000 0.00 11 Nov, 2020 ['indiegames3000'] ['indiegames3000'] ['English'] [] ['Farming Sim', 'Farming', 'Exploration', '2D ...

203 rows × 18 columns

These values can be replaced with NaN values, as they contain no information.

In [7]:
Out[7]:
id name total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags
241 900883 The Elder Scrolls IV: Oblivion® Game of the Ye... 37004 35362 1642 95.56 1500000 1000000 2000000 0.000000 0.000000 14.99 16 Jun, 2009 ['Bethesda Game Studios®'] ['Bethesda Softworks'] ['English'] NaN ['RPG', 'Open World', 'Fantasy', 'Singleplayer...
491 245550 Free to Play 10969 10430 539 95.09 10000 0 20000 1.983333 1.383333 0.00 19 Mar, 2014 ['Valve'] ['Valve'] ['English', 'French', 'Italian', 'German', 'Sp... NaN ['Free to Play', 'Documentary', 'eSports', 'Ac...
915 241930 Middle-earth™: Shadow of Mordor™ 75269 69431 5838 92.24 7500000 5000000 10000000 20.200000 14.750000 9.99 30 Sep, 2014 ['Monolith Productions'] ['Warner Bros. Interactive Entertainment', 'Wa... ['English', 'French', 'Italian', 'German', 'Sp... NaN ['Open World', 'Action', 'Fantasy', 'Adventure...
1721 701380 El Tango de la Muerte 225 221 4 98.22 10000 0 20000 0.000000 0.000000 4.99 24 Apr, 2018 ['Hernán Smicht', 'YIRA::'] ['Hernán Smicht'] ['English', 'Spanish - Spain', 'Simplified Chi... NaN ['Rhythm', 'Well-Written', 'Music', 'Comedy', ...
2120 242550 Rayman Legends 5920 5385 535 90.96 350000 200000 500000 11.783333 11.983333 29.99 29 Aug, 2013 NaN ['Ubisoft'] ['English', 'French', 'Italian', 'German', 'Sp... ['Action', 'Adventure'] ['Platformer', 'Adventure', 'Action', 'Local C...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58165 252770 Vox 386 88 298 22.80 35000 20000 50000 1.833333 2.266667 0.00 12 Nov, 2013 NaN NaN ['English'] ['Action', 'Adventure', 'Indie', 'RPG', 'Early... ['RPG', 'Indie', 'Adventure', 'Action', 'Early...
58304 825430 Defiance 2050 - Beta 451 89 362 19.73 10000 0 20000 0.000000 0.000000 0.00 20 Apr, 2018 NaN NaN ['English'] NaN NaN
58323 260510 World Basketball Tycoon 104 16 88 15.38 35000 20000 50000 0.000000 0.000000 2.99 18 Nov, 2013 NaN ['Strategy First'] ['English', 'French', 'Italian', 'German', 'Sp... ['Simulation'] ['Simulation', 'Management', 'Basketball', 'Sp...
58357 218980 Patterns 154 23 131 14.94 10000 0 20000 0.000000 0.000000 0.00 NaN NaN NaN ['English'] ['Casual', 'Simulation', 'Strategy'] ['Casual', 'Simulation', 'Strategy', 'Sandbox'...
58404 1434500 MiniFarm 2020 25 0 25 0.00 10000 0 20000 0.000000 0.000000 0.00 11 Nov, 2020 ['indiegames3000'] ['indiegames3000'] ['English'] NaN ['Farming Sim', 'Farming', 'Exploration', '2D ...

203 rows × 18 columns

Next, there are some games that have NaN listed as price. This is not due to these games being free-to-play, as other free-to-play games have 0.00 listed as their price. Rather, this is most likely due to a bug in SteamSpy's API, the API used to collect the data for the prices.

In [8]:
Out[8]:
id name total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags
103 1698960 Project Kat - Paper Lily Prologue 2999 2978 21 99.30 10000 0 20000 0.000000 0.00 NaN 15 Oct, 2021 ['Leef 6010'] ['Leef 6010'] ['English', 'Japanese'] ['Adventure', 'Free to Play', 'Indie', 'RPG'] ['Anime', 'Psychological Horror', 'Pixel Graph...
114 1806840 100 hidden frogs 3236 3205 31 99.04 10000 0 20000 0.000000 0.00 NaN 15 Nov, 2021 ['Anatoliy Loginovskikh'] ['Anatoliy Loginovskikh'] ['English'] ['Adventure', 'Casual', 'Free to Play', 'Indie'] ['Free to Play', 'Hidden Object', 'Hand-drawn'...
152 1985690 The Looker 6247 6110 137 97.81 10000 0 20000 0.000000 0.00 NaN 17 Jun, 2022 ['Subcreation Studio'] ['Subcreation Studio'] ['English'] ['Indie'] ['Puzzle', 'Parody ', 'Comedy', 'Funny', 'Expl...
177 1708870 第七号列车 - Train No. 7 4576 4482 94 97.95 10000 0 20000 0.000000 0.00 NaN 13 Aug, 2021 ['水野的四叶草'] ['Eternal Dream'] ['English', 'Simplified Chinese'] ['Casual', 'Free to Play', 'Indie', 'RPG'] ['Casual', 'RPG', 'Adventure', 'Relaxing', 'Sp...
222 1713610 Purrgatory 1862 1840 22 98.82 10000 0 20000 0.000000 0.00 NaN 22 Aug, 2021 ['Niv (Darvin Heo)'] ['Niv (Darvin Heo)'] ['English', 'Spanish - Spain'] ['Adventure', 'Casual', 'Free to Play', 'Indie'] ['Visual Novel', 'Cute', 'LGBTQ+', 'Point & Cl...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58181 1665460 eFootball™ 2022 37348 9683 27665 25.93 10000 0 20000 2.450000 0.70 NaN 29 Sep, 2021 ['Konami Digital Entertainment'] ['Konami Digital Entertainment'] ['English', 'French', 'Italian', 'German', 'Sp... ['Free to Play', 'Simulation', 'Sports'] ['Free to Play', 'Soccer', 'Sports', 'Simulati...
58213 1868030 Kilroy Was Here 7 0 7 0.00 10000 0 20000 0.000000 0.00 NaN 31 Jan, 2022 ['4A50 Studios'] ['4A50 Studios'] ['English'] ['Action', 'Adventure', 'Casual'] ['Adventure', 'Casual', 'Sci-fi', 'Action', 'I...
58243 794690 光明大陆 51 8 43 15.69 10000 0 20000 0.083333 0.15 NaN 22 Mar, 2018 ['Netease'] ['Netease'] ['English', 'Simplified Chinese'] ['Action', 'Adventure', 'Free to Play', 'Massi... ['Adventure', 'RPG', 'Massively Multiplayer', ...
58399 1668940 崩坏3 8437 1455 6982 17.25 10000 0 20000 0.000000 0.00 NaN 21 Oct, 2021 ['miHoYo Limited'] ['miHoYo Limited'] ['English', 'French', 'German', 'Japanese', 'S... ['Action', 'Adventure', 'Free to Play', 'RPG'] ['Anime', 'Action', 'RPG', 'Adventure', '3D Fi...
58408 1180320 三国杀 15824 2130 13694 13.46 10000 0 20000 0.000000 0.00 NaN 17 Dec, 2021 ['杭州游卡网络技术有限公司'] ['杭州游卡网络技术有限公司'] ['English', 'Simplified Chinese'] ['Strategy'] ['Sexual Content', 'Psychological Horror', 'Nu...

2123 rows × 18 columns

The SteamSpy API was also used to collect the data for the owner numbers, as well as avg_playtime and median_playtime. However, avg_playtime and median_playtime seemed unaffected by this bug, as seen by the games with NaN as their price having non-zero values for avg_playtime and median_playtime. On the other hand, owners, min_owners and max_owners seem to be locked as one value for all of these games.

In [9]:
Out[9]:
10000    2123
Name: owners, dtype: int64

As a result, we have no choice but to set the owners, min_owners and max_owners of the affected games as NaN, as these values could be inaccurate and could affect the distribution of the data.

In [10]:
Out[10]:
id name total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags
103 1698960 Project Kat - Paper Lily Prologue 2999 2978 21 99.30 NaN NaN NaN 0.000000 0.00 NaN 15 Oct, 2021 ['Leef 6010'] ['Leef 6010'] ['English', 'Japanese'] ['Adventure', 'Free to Play', 'Indie', 'RPG'] ['Anime', 'Psychological Horror', 'Pixel Graph...
114 1806840 100 hidden frogs 3236 3205 31 99.04 NaN NaN NaN 0.000000 0.00 NaN 15 Nov, 2021 ['Anatoliy Loginovskikh'] ['Anatoliy Loginovskikh'] ['English'] ['Adventure', 'Casual', 'Free to Play', 'Indie'] ['Free to Play', 'Hidden Object', 'Hand-drawn'...
152 1985690 The Looker 6247 6110 137 97.81 NaN NaN NaN 0.000000 0.00 NaN 17 Jun, 2022 ['Subcreation Studio'] ['Subcreation Studio'] ['English'] ['Indie'] ['Puzzle', 'Parody ', 'Comedy', 'Funny', 'Expl...
177 1708870 第七号列车 - Train No. 7 4576 4482 94 97.95 NaN NaN NaN 0.000000 0.00 NaN 13 Aug, 2021 ['水野的四叶草'] ['Eternal Dream'] ['English', 'Simplified Chinese'] ['Casual', 'Free to Play', 'Indie', 'RPG'] ['Casual', 'RPG', 'Adventure', 'Relaxing', 'Sp...
222 1713610 Purrgatory 1862 1840 22 98.82 NaN NaN NaN 0.000000 0.00 NaN 22 Aug, 2021 ['Niv (Darvin Heo)'] ['Niv (Darvin Heo)'] ['English', 'Spanish - Spain'] ['Adventure', 'Casual', 'Free to Play', 'Indie'] ['Visual Novel', 'Cute', 'LGBTQ+', 'Point & Cl...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58181 1665460 eFootball™ 2022 37348 9683 27665 25.93 NaN NaN NaN 2.450000 0.70 NaN 29 Sep, 2021 ['Konami Digital Entertainment'] ['Konami Digital Entertainment'] ['English', 'French', 'Italian', 'German', 'Sp... ['Free to Play', 'Simulation', 'Sports'] ['Free to Play', 'Soccer', 'Sports', 'Simulati...
58213 1868030 Kilroy Was Here 7 0 7 0.00 NaN NaN NaN 0.000000 0.00 NaN 31 Jan, 2022 ['4A50 Studios'] ['4A50 Studios'] ['English'] ['Action', 'Adventure', 'Casual'] ['Adventure', 'Casual', 'Sci-fi', 'Action', 'I...
58243 794690 光明大陆 51 8 43 15.69 NaN NaN NaN 0.083333 0.15 NaN 22 Mar, 2018 ['Netease'] ['Netease'] ['English', 'Simplified Chinese'] ['Action', 'Adventure', 'Free to Play', 'Massi... ['Adventure', 'RPG', 'Massively Multiplayer', ...
58399 1668940 崩坏3 8437 1455 6982 17.25 NaN NaN NaN 0.000000 0.00 NaN 21 Oct, 2021 ['miHoYo Limited'] ['miHoYo Limited'] ['English', 'French', 'German', 'Japanese', 'S... ['Action', 'Adventure', 'Free to Play', 'RPG'] ['Anime', 'Action', 'RPG', 'Adventure', '3D Fi...
58408 1180320 三国杀 15824 2130 13694 13.46 NaN NaN NaN 0.000000 0.00 NaN 17 Dec, 2021 ['杭州游卡网络技术有限公司'] ['杭州游卡网络技术有限公司'] ['English', 'Simplified Chinese'] ['Strategy'] ['Sexual Content', 'Psychological Horror', 'Nu...

2123 rows × 18 columns

The date column was collected as strings, thus it has to be converted into DateTime format in order to allow for time series analysis.

In [11]:
Out[11]:
id name total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags
0 620 Portal 2 302395 298749 3646 98.79 15000000.0 10000000.0 20000000.0 16.633333 8.266667 9.99 2011-04-18 ['Valve'] ['Valve'] ['English', 'French', 'German', 'Spanish - Spa... ['Action', 'Adventure'] ['Platformer', 'Puzzle', 'Dark Humor', 'First-...
1 1118200 People Playground 129491 128053 1438 98.89 3500000.0 2000000.0 5000000.0 30.866667 12.166667 9.99 2019-07-23 ['mestiez'] ['Studio Minus'] ['English'] ['Action', 'Casual', 'Indie', 'Simulation'] ['Sandbox', 'Physics', 'Gore', 'Violent', 'Mod...
2 1794680 Vampire Survivors 114325 113067 1258 98.90 3500000.0 2000000.0 5000000.0 22.616667 14.700000 2.99 2021-12-17 ['poncle'] ['poncle'] ['English'] ['Action', 'Casual', 'Indie', 'RPG', 'Early Ac... ['Action Roguelike', 'Pixel Graphics', 'Bullet...
3 1145360 Hades 194006 191332 2674 98.62 7500000.0 5000000.0 10000000.0 35.166667 18.866667 24.99 2020-09-17 ['Supergiant Games'] ['Supergiant Games'] ['English', 'French', 'Italian', 'German', 'Sp... ['Action', 'Indie', 'RPG'] ['Action Roguelike', 'Indie', 'Roguelite', 'Ac...
4 413150 Stardew Valley 485586 476588 8998 98.15 15000000.0 10000000.0 20000000.0 66.883333 30.433333 14.99 2016-02-26 ['ConcernedApe'] ['ConcernedApe'] ['English', 'German', 'Spanish - Spain', 'Japa... ['Indie', 'RPG', 'Simulation'] ['Farming Sim', 'Life Sim', 'Pixel Graphics', ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 Construction Machines 2014 311 36 275 11.58 35000.0 20000.0 50000.0 0.483333 0.483333 6.99 2014-03-28 ['GameCask'] ['GameCask'] ['English', 'French', 'German', 'Polish'] ['Simulation'] ['Simulation', 'Building', 'Singleplayer']
58406 210490 Fray 50 1 49 2.00 10000.0 0.0 20000.0 0.000000 0.000000 0.00 2012-06-19 ['Brain Candy'] ['Brain Candy'] ['English'] ['Action', 'Strategy', 'Indie'] ['Strategy', 'Action', 'Indie']
58407 257930 Race To Mars 261 20 241 7.66 10000.0 0.0 20000.0 0.000000 0.000000 0.00 2014-03-07 ['INTERMARUM', 'ONE MORE LEVEL'] ['ONE MORE LEVEL'] ['English', 'Polish'] ['Indie', 'Simulation', 'Strategy', 'Early Acc... ['Strategy', 'Simulation', 'Indie', 'Space', '...
58408 1180320 三国杀 15824 2130 13694 13.46 NaN NaN NaN 0.000000 0.000000 NaN 2021-12-17 ['杭州游卡网络技术有限公司'] ['杭州游卡网络技术有限公司'] ['English', 'Simplified Chinese'] ['Strategy'] ['Sexual Content', 'Psychological Horror', 'Nu...
58409 397760 Urban War Defense 79 2 77 2.53 10000.0 0.0 20000.0 20.700000 20.700000 5.99 2017-07-07 ['Budgie Games'] ['Budgie Games'] ['English'] ['Action', 'Indie', 'Strategy', 'Early Access'] ['Action', 'Indie', 'Early Access', 'Strategy'...

58410 rows × 18 columns

In order to compare data between indie and non-indie games, the games have to be classified as either "indie" or "non-indie". To do this, we can refer to the genres and tags of a game, and if either of them include the "Indie" tag, then it is classified as an indie game, else it is a non-indie game. We can create a new column is_indie to store this data.

In [12]:
Out[12]:
id name is_indie total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags
0 620 Portal 2 False 302395 298749 3646 98.79 15000000.0 10000000.0 20000000.0 16.633333 8.266667 9.99 2011-04-18 ['Valve'] ['Valve'] ['English', 'French', 'German', 'Spanish - Spa... ['Action', 'Adventure'] ['Platformer', 'Puzzle', 'Dark Humor', 'First-...
1 1118200 People Playground True 129491 128053 1438 98.89 3500000.0 2000000.0 5000000.0 30.866667 12.166667 9.99 2019-07-23 ['mestiez'] ['Studio Minus'] ['English'] ['Action', 'Casual', 'Indie', 'Simulation'] ['Sandbox', 'Physics', 'Gore', 'Violent', 'Mod...
2 1794680 Vampire Survivors True 114325 113067 1258 98.90 3500000.0 2000000.0 5000000.0 22.616667 14.700000 2.99 2021-12-17 ['poncle'] ['poncle'] ['English'] ['Action', 'Casual', 'Indie', 'RPG', 'Early Ac... ['Action Roguelike', 'Pixel Graphics', 'Bullet...
3 1145360 Hades True 194006 191332 2674 98.62 7500000.0 5000000.0 10000000.0 35.166667 18.866667 24.99 2020-09-17 ['Supergiant Games'] ['Supergiant Games'] ['English', 'French', 'Italian', 'German', 'Sp... ['Action', 'Indie', 'RPG'] ['Action Roguelike', 'Indie', 'Roguelite', 'Ac...
4 413150 Stardew Valley True 485586 476588 8998 98.15 15000000.0 10000000.0 20000000.0 66.883333 30.433333 14.99 2016-02-26 ['ConcernedApe'] ['ConcernedApe'] ['English', 'German', 'Spanish - Spain', 'Japa... ['Indie', 'RPG', 'Simulation'] ['Farming Sim', 'Life Sim', 'Pixel Graphics', ...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 Construction Machines 2014 False 311 36 275 11.58 35000.0 20000.0 50000.0 0.483333 0.483333 6.99 2014-03-28 ['GameCask'] ['GameCask'] ['English', 'French', 'German', 'Polish'] ['Simulation'] ['Simulation', 'Building', 'Singleplayer']
58406 210490 Fray True 50 1 49 2.00 10000.0 0.0 20000.0 0.000000 0.000000 0.00 2012-06-19 ['Brain Candy'] ['Brain Candy'] ['English'] ['Action', 'Strategy', 'Indie'] ['Strategy', 'Action', 'Indie']
58407 257930 Race To Mars True 261 20 241 7.66 10000.0 0.0 20000.0 0.000000 0.000000 0.00 2014-03-07 ['INTERMARUM', 'ONE MORE LEVEL'] ['ONE MORE LEVEL'] ['English', 'Polish'] ['Indie', 'Simulation', 'Strategy', 'Early Acc... ['Strategy', 'Simulation', 'Indie', 'Space', '...
58408 1180320 三国杀 False 15824 2130 13694 13.46 NaN NaN NaN 0.000000 0.000000 NaN 2021-12-17 ['杭州游卡网络技术有限公司'] ['杭州游卡网络技术有限公司'] ['English', 'Simplified Chinese'] ['Strategy'] ['Sexual Content', 'Psychological Horror', 'Nu...
58409 397760 Urban War Defense True 79 2 77 2.53 10000.0 0.0 20000.0 20.700000 20.700000 5.99 2017-07-07 ['Budgie Games'] ['Budgie Games'] ['English'] ['Action', 'Indie', 'Strategy', 'Early Access'] ['Action', 'Indie', 'Early Access', 'Strategy'...

58410 rows × 19 columns

Unfortunately, Steam protects the data for the true number of owners of each Steam game. Therefore, SteamSpy was only able to estimate a range for the number of owners of a Steam game, which is why there are 3 columns for the number of owners, owners, min_owners and max_owners. This means that using owners to estimate popularity and success of a game can be inaccurate, but we can use total_reviews for this exact purpose. However, we can use owners, min_owners and max_owners as a way to sort the games into different categories by popularity, allowing us to compare Steam games of different popularity.

In [13]:
Out[13]:
id name is_indie total_reviews positive_reviews negative_reviews rating owners min_owners max_owners avg_playtime median_playtime price date developers publishers languages genres tags owners_binned
0 620 Portal 2 False 302395 298749 3646 98.79 15000000.0 10000000.0 20000000.0 16.633333 8.266667 9.99 2011-04-18 ['Valve'] ['Valve'] ['English', 'French', 'German', 'Spanish - Spa... ['Action', 'Adventure'] ['Platformer', 'Puzzle', 'Dark Humor', 'First-... Highest
1 1118200 People Playground True 129491 128053 1438 98.89 3500000.0 2000000.0 5000000.0 30.866667 12.166667 9.99 2019-07-23 ['mestiez'] ['Studio Minus'] ['English'] ['Action', 'Casual', 'Indie', 'Simulation'] ['Sandbox', 'Physics', 'Gore', 'Violent', 'Mod... Highest
2 1794680 Vampire Survivors True 114325 113067 1258 98.90 3500000.0 2000000.0 5000000.0 22.616667 14.700000 2.99 2021-12-17 ['poncle'] ['poncle'] ['English'] ['Action', 'Casual', 'Indie', 'RPG', 'Early Ac... ['Action Roguelike', 'Pixel Graphics', 'Bullet... Highest
3 1145360 Hades True 194006 191332 2674 98.62 7500000.0 5000000.0 10000000.0 35.166667 18.866667 24.99 2020-09-17 ['Supergiant Games'] ['Supergiant Games'] ['English', 'French', 'Italian', 'German', 'Sp... ['Action', 'Indie', 'RPG'] ['Action Roguelike', 'Indie', 'Roguelite', 'Ac... Highest
4 413150 Stardew Valley True 485586 476588 8998 98.15 15000000.0 10000000.0 20000000.0 66.883333 30.433333 14.99 2016-02-26 ['ConcernedApe'] ['ConcernedApe'] ['English', 'German', 'Spanish - Spain', 'Japa... ['Indie', 'RPG', 'Simulation'] ['Farming Sim', 'Life Sim', 'Pixel Graphics', ... Highest
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 Construction Machines 2014 False 311 36 275 11.58 35000.0 20000.0 50000.0 0.483333 0.483333 6.99 2014-03-28 ['GameCask'] ['GameCask'] ['English', 'French', 'German', 'Polish'] ['Simulation'] ['Simulation', 'Building', 'Singleplayer'] Low
58406 210490 Fray True 50 1 49 2.00 10000.0 0.0 20000.0 0.000000 0.000000 0.00 2012-06-19 ['Brain Candy'] ['Brain Candy'] ['English'] ['Action', 'Strategy', 'Indie'] ['Strategy', 'Action', 'Indie'] Lowest
58407 257930 Race To Mars True 261 20 241 7.66 10000.0 0.0 20000.0 0.000000 0.000000 0.00 2014-03-07 ['INTERMARUM', 'ONE MORE LEVEL'] ['ONE MORE LEVEL'] ['English', 'Polish'] ['Indie', 'Simulation', 'Strategy', 'Early Acc... ['Strategy', 'Simulation', 'Indie', 'Space', '... Lowest
58408 1180320 三国杀 False 15824 2130 13694 13.46 NaN NaN NaN 0.000000 0.000000 NaN 2021-12-17 ['杭州游卡网络技术有限公司'] ['杭州游卡网络技术有限公司'] ['English', 'Simplified Chinese'] ['Strategy'] ['Sexual Content', 'Psychological Horror', 'Nu... NaN
58409 397760 Urban War Defense True 79 2 77 2.53 10000.0 0.0 20000.0 20.700000 20.700000 5.99 2017-07-07 ['Budgie Games'] ['Budgie Games'] ['English'] ['Action', 'Indie', 'Strategy', 'Early Access'] ['Action', 'Indie', 'Early Access', 'Strategy'... Lowest

58410 rows × 20 columns

The games has been binned into 4 different categories, Highest, High, Low and Lowest. The size of these bins are unequal and grows exponentially smaller from Lowest to Highest.

In [14]:
Out[14]:
Lowest     39264
Low        10676
High        5359
Highest      988
Name: owners_binned, dtype: int64
In [15]:

Finally, we can add columns for the number of developers, publishers and languages a game has, as columns developers_count, publishers_count and languages_count respectively.

In [16]:
Out[16]:
id name is_indie total_reviews positive_reviews negative_reviews rating owners min_owners max_owners ... date developers developers_count publishers publishers_count languages languages_count genres tags owners_binned
0 620 Portal 2 False 302395 298749 3646 98.79 15000000.0 10000000.0 20000000.0 ... 2011-04-18 ['Valve'] 1 ['Valve'] 1 ['English', 'French', 'German', 'Spanish - Spa... 22 ['Action', 'Adventure'] ['Platformer', 'Puzzle', 'Dark Humor', 'First-... Highest
1 1118200 People Playground True 129491 128053 1438 98.89 3500000.0 2000000.0 5000000.0 ... 2019-07-23 ['mestiez'] 1 ['Studio Minus'] 1 ['English'] 1 ['Action', 'Casual', 'Indie', 'Simulation'] ['Sandbox', 'Physics', 'Gore', 'Violent', 'Mod... Highest
2 1794680 Vampire Survivors True 114325 113067 1258 98.90 3500000.0 2000000.0 5000000.0 ... 2021-12-17 ['poncle'] 1 ['poncle'] 1 ['English'] 1 ['Action', 'Casual', 'Indie', 'RPG', 'Early Ac... ['Action Roguelike', 'Pixel Graphics', 'Bullet... Highest
3 1145360 Hades True 194006 191332 2674 98.62 7500000.0 5000000.0 10000000.0 ... 2020-09-17 ['Supergiant Games'] 1 ['Supergiant Games'] 1 ['English', 'French', 'Italian', 'German', 'Sp... 11 ['Action', 'Indie', 'RPG'] ['Action Roguelike', 'Indie', 'Roguelite', 'Ac... Highest
4 413150 Stardew Valley True 485586 476588 8998 98.15 15000000.0 10000000.0 20000000.0 ... 2016-02-26 ['ConcernedApe'] 1 ['ConcernedApe'] 1 ['English', 'German', 'Spanish - Spain', 'Japa... 12 ['Indie', 'RPG', 'Simulation'] ['Farming Sim', 'Life Sim', 'Pixel Graphics', ... Highest
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 Construction Machines 2014 False 311 36 275 11.58 35000.0 20000.0 50000.0 ... 2014-03-28 ['GameCask'] 1 ['GameCask'] 1 ['English', 'French', 'German', 'Polish'] 4 ['Simulation'] ['Simulation', 'Building', 'Singleplayer'] Low
58406 210490 Fray True 50 1 49 2.00 10000.0 0.0 20000.0 ... 2012-06-19 ['Brain Candy'] 1 ['Brain Candy'] 1 ['English'] 1 ['Action', 'Strategy', 'Indie'] ['Strategy', 'Action', 'Indie'] Lowest
58407 257930 Race To Mars True 261 20 241 7.66 10000.0 0.0 20000.0 ... 2014-03-07 ['INTERMARUM', 'ONE MORE LEVEL'] 2 ['ONE MORE LEVEL'] 1 ['English', 'Polish'] 2 ['Indie', 'Simulation', 'Strategy', 'Early Acc... ['Strategy', 'Simulation', 'Indie', 'Space', '... Lowest
58408 1180320 三国杀 False 15824 2130 13694 13.46 NaN NaN NaN ... 2021-12-17 ['杭州游卡网络技术有限公司'] 1 ['杭州游卡网络技术有限公司'] 1 ['English', 'Simplified Chinese'] 2 ['Strategy'] ['Sexual Content', 'Psychological Horror', 'Nu... NaN
58409 397760 Urban War Defense True 79 2 77 2.53 10000.0 0.0 20000.0 ... 2017-07-07 ['Budgie Games'] 1 ['Budgie Games'] 1 ['English'] 1 ['Action', 'Indie', 'Strategy', 'Early Access'] ['Action', 'Indie', 'Early Access', 'Strategy'... Lowest

58410 rows × 23 columns

Finally, we have the summary of steam_df. None of the columns have a lot of NaN values, with the most being 2123, from the price columns and the owner columns. Therefore, NaN values can be dropped if required.

In [17]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58410 entries, 0 to 58409
Data columns (total 23 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   id                58410 non-null  int64         
 1   name              58410 non-null  object        
 2   is_indie          58410 non-null  bool          
 3   total_reviews     58410 non-null  int64         
 4   positive_reviews  58410 non-null  int64         
 5   negative_reviews  58410 non-null  int64         
 6   rating            58410 non-null  float64       
 7   owners            56287 non-null  float64       
 8   min_owners        56287 non-null  float64       
 9   max_owners        56287 non-null  float64       
 10  avg_playtime      58410 non-null  float64       
 11  median_playtime   58410 non-null  float64       
 12  price             56287 non-null  float64       
 13  date              58309 non-null  datetime64[ns]
 14  developers        58271 non-null  object        
 15  developers_count  58271 non-null  Int64         
 16  publishers        58322 non-null  object        
 17  publishers_count  58322 non-null  Int64         
 18  languages         58406 non-null  object        
 19  languages_count   58406 non-null  Int64         
 20  genres            58338 non-null  object        
 21  tags              58394 non-null  object        
 22  owners_binned     56287 non-null  object        
dtypes: Int64(3), bool(1), datetime64[ns](1), float64(7), int64(4), object(7)
memory usage: 10.0+ MB

avg_players_df & peak_players_df

For avg_players_df, there are a lot of NaN values. This is either due to missing data, or due to the game not being released yet at that time period. Regardless, we are able to replace all NaN values with 0.

The index has to be changed to the DateTime format in order for time series analysis.

The is_indie and owners_binned columns from steam_df can be added to be able to compare indie games and non-indie games, as well as Steam games of different popularity.

In [18]:
Out[18]:
id is_indie owners_binned 2012-07-01 00:00:00 2012-08-01 00:00:00 2012-09-01 00:00:00 2012-10-01 00:00:00 2012-11-01 00:00:00 2012-12-01 00:00:00 2013-01-01 00:00:00 ... 2021-10-01 00:00:00 2021-11-01 00:00:00 2021-12-01 00:00:00 2022-01-01 00:00:00 2022-02-01 00:00:00 2022-03-01 00:00:00 2022-04-01 00:00:00 2022-05-01 00:00:00 2022-06-01 00:00:00 2022-07-01 00:00:00
0 620 False Highest 4167.67 2480.58 1853.0 1220.53 1795.32 3003.45 2740.09 ... 1421.25 1854.42 2641.66 2975.97 2462.39 2012.90 1847.38 1668.29 1865.04 2337.15
1 1118200 True Highest 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 4768.48 5097.49 5064.49 6104.98 6482.67 6415.66 6979.02 6762.41 6562.11 7054.70
2 1794680 True Highest 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 0.00 0.00 4.13 10785.00 39031.65 27206.71 33328.82 22304.75 17195.98 12884.41
3 1145360 True Highest 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 4921.48 5745.22 6371.04 7379.91 6356.76 4961.04 4608.01 3751.57 3941.76 6191.93
4 413150 True Highest 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 22414.12 23206.18 24951.71 33621.58 28793.45 25851.21 26829.80 28036.57 31775.77 39076.48
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 False Low 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
58406 210490 True Lowest 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
58407 257930 True Lowest 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
58408 1180320 False NaN 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 0.00 0.00 339.39 215.27 196.50 245.59 253.09 266.06 260.53 281.44
58409 397760 True Lowest 0.00 0.00 0.0 0.00 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

58410 rows × 124 columns

A similar process can be done for peak_players_df.

In [19]:
Out[19]:
id is_indie owners_binned 2012-07-01 00:00:00 2012-08-01 00:00:00 2012-09-01 00:00:00 2012-10-01 00:00:00 2012-11-01 00:00:00 2012-12-01 00:00:00 2013-01-01 00:00:00 ... 2021-10-01 00:00:00 2021-11-01 00:00:00 2021-12-01 00:00:00 2022-01-01 00:00:00 2022-02-01 00:00:00 2022-03-01 00:00:00 2022-04-01 00:00:00 2022-05-01 00:00:00 2022-06-01 00:00:00 2022-07-01 00:00:00
0 620 False Highest 8857.0 5024.0 4131.0 2573.0 8555.0 7471.0 6366.0 ... 2615.0 6216.0 5813.0 6139.0 4892.0 3686.0 3562.0 3104.0 4433.0 4520.0
1 1118200 True Highest 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 12749.0 9432.0 9053.0 10596.0 13056.0 11519.0 13019.0 13774.0 10078.0 11054.0
2 1794680 True Highest 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 12.0 50847.0 77061.0 58323.0 68805.0 48081.0 34545.0 21058.0
3 1145360 True Highest 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 13666.0 12607.0 11055.0 12295.0 12076.0 10310.0 9267.0 6097.0 10020.0 14044.0
4 413150 True Highest 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 35687.0 35441.0 39992.0 48151.0 44044.0 38568.0 40444.0 42786.0 49366.0 54288.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58405 252050 False Low 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
58406 210490 True Lowest 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
58407 257930 True Lowest 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
58408 1180320 False NaN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 926.0 413.0 391.0 552.0 505.0 514.0 498.0 523.0
58409 397760 True Lowest 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

58410 rows × 124 columns

itchio_df

Not as much cleaning has to be done on itchio_df as steam_df.

The Price column was collected as strings, thus it has to be converted into floats.

In [20]:
Out[20]:
Name Price Platforms Rating Authors Genre Tags Number of Reviews Made with Average session Languages Inputs Accessibility
0 ​Our Life: Beginnings & Always 0.0 ['Windows', 'macOS', 'Linux', 'Android'] 5.0 ['GBPatch'] ['Visual Novel', 'Interactive Fiction'] ['amare', 'Comedy', 'Dating Sim', 'Gay', 'LGBT... 3054.0 NaN NaN NaN NaN NaN
1 HoloCure 0.0 ['Windows'] 5.0 ['Kay Yu'] ['Action'] ['Fangame', 'hololive', 'Pixel Art', 'Roguelit... 2923.0 ['GameMaker: Studio'] About a half-hour ['English', 'Japanese'] ['Keyboard'] NaN
2 Friday Night Funkin' 0.0 ['Windows', 'macOS', 'Linux', 'HTML5'] 5.0 ['ninjamuffin99', 'PhantomArcade'] ['Rhythm'] ['2D'] 9536.0 ['OpenFL', 'IndieCade', 'Haxe'] About an hour NaN ['Dance pad'] NaN
3 Adventures With Anxiety! 0.0 ['HTML5'] 5.0 ['Nicky Case!'] ['Visual Novel'] ['Comedy', 'Mental Health', 'Narrative'] 3158.0 NaN NaN NaN NaN NaN
4 Butterfly Soup 0.0 ['Windows', 'macOS', 'Linux'] 5.0 ['Brianna Lei'] ['Visual Novel', 'Interactive Fiction'] ['2D', 'Anime', 'Female Protagonist', 'LGBT', ... 3276.0 ["Ren'Py"] A few seconds ['Czech', 'Persian', 'Japanese', 'Korean', 'Po... NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ...
13963 casual carnival (prototype) 0.0 ['Windows', 'HTML5'] 5.0 ['olmewe'] ['Card Game', 'Puzzle'] ['2D', 'Arcade', 'Furry', 'pupy'] 7.0 ['Unity'] About a half-hour ['English'] ['Keyboard', 'Mouse'] NaN
13964 Cook'em Up 0.0 ['Windows', 'HTML5'] 5.0 ['Carbonara', 'Yanni', 'Aredhele', 'Rémi Cros'... ['Shooter'] NaN 7.0 ['Unity'] NaN NaN NaN NaN
13965 Super Funky Light Show - Full version 0.0 NaN 5.0 ['OwenSenior'] ['Shooter'] NaN 7.0 NaN NaN NaN NaN NaN
13966 A World in a Jar 0.0 ['Windows', 'macOS', 'Linux'] 5.0 ['Zeknir', 'Traincraft'] ['Puzzle'] ['Casual', 'Ludum Dare 38', 'Pixel Art', 'Sing... 7.0 NaN About a half-hour ['English'] ['Mouse', 'Touchscreen'] ['Interactive tutorial']
13967 TURDLE C64 0.0 ['HTML5'] 5.0 ['Roysterini'] ['Puzzle'] ['Commodore 64', 'megastyle', 'Retro', 'silly'... 7.0 NaN NaN NaN NaN NaN

13968 rows × 13 columns

Similar to steam_df, we can add columns for the number of platforms, tools, languages, inputs and accessibilities a game has, as columns platforms_count, tools_count, languages_count, inputs_count and accessibility_count respectively.

In [21]:
Out[21]:
Name Price Platforms platforms_count Rating Authors Genre Tags Number of Reviews Made with tools_count Average session Languages languages_count Inputs inputs_count Accessibility accessibility_count
0 ​Our Life: Beginnings & Always 0.0 ['Windows', 'macOS', 'Linux', 'Android'] 4 5.0 ['GBPatch'] ['Visual Novel', 'Interactive Fiction'] ['amare', 'Comedy', 'Dating Sim', 'Gay', 'LGBT... 3054.0 NaN <NA> NaN NaN <NA> NaN <NA> NaN <NA>
1 HoloCure 0.0 ['Windows'] 1 5.0 ['Kay Yu'] ['Action'] ['Fangame', 'hololive', 'Pixel Art', 'Roguelit... 2923.0 ['GameMaker: Studio'] 1 About a half-hour ['English', 'Japanese'] 2 ['Keyboard'] 1 NaN <NA>
2 Friday Night Funkin' 0.0 ['Windows', 'macOS', 'Linux', 'HTML5'] 4 5.0 ['ninjamuffin99', 'PhantomArcade'] ['Rhythm'] ['2D'] 9536.0 ['OpenFL', 'IndieCade', 'Haxe'] 3 About an hour NaN <NA> ['Dance pad'] 1 NaN <NA>
3 Adventures With Anxiety! 0.0 ['HTML5'] 1 5.0 ['Nicky Case!'] ['Visual Novel'] ['Comedy', 'Mental Health', 'Narrative'] 3158.0 NaN <NA> NaN NaN <NA> NaN <NA> NaN <NA>
4 Butterfly Soup 0.0 ['Windows', 'macOS', 'Linux'] 3 5.0 ['Brianna Lei'] ['Visual Novel', 'Interactive Fiction'] ['2D', 'Anime', 'Female Protagonist', 'LGBT', ... 3276.0 ["Ren'Py"] 1 A few seconds ['Czech', 'Persian', 'Japanese', 'Korean', 'Po... 7 NaN <NA> NaN <NA>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13963 casual carnival (prototype) 0.0 ['Windows', 'HTML5'] 2 5.0 ['olmewe'] ['Card Game', 'Puzzle'] ['2D', 'Arcade', 'Furry', 'pupy'] 7.0 ['Unity'] 1 About a half-hour ['English'] 1 ['Keyboard', 'Mouse'] 2 NaN <NA>
13964 Cook'em Up 0.0 ['Windows', 'HTML5'] 2 5.0 ['Carbonara', 'Yanni', 'Aredhele', 'Rémi Cros'... ['Shooter'] NaN 7.0 ['Unity'] 1 NaN NaN <NA> NaN <NA> NaN <NA>
13965 Super Funky Light Show - Full version 0.0 NaN <NA> 5.0 ['OwenSenior'] ['Shooter'] NaN 7.0 NaN <NA> NaN NaN <NA> NaN <NA> NaN <NA>
13966 A World in a Jar 0.0 ['Windows', 'macOS', 'Linux'] 3 5.0 ['Zeknir', 'Traincraft'] ['Puzzle'] ['Casual', 'Ludum Dare 38', 'Pixel Art', 'Sing... 7.0 NaN <NA> About a half-hour ['English'] 1 ['Mouse', 'Touchscreen'] 2 ['Interactive tutorial'] 1
13967 TURDLE C64 0.0 ['HTML5'] 1 5.0 ['Roysterini'] ['Puzzle'] ['Commodore 64', 'megastyle', 'Retro', 'silly'... 7.0 NaN <NA> NaN NaN <NA> NaN <NA> NaN <NA>

13968 rows × 18 columns

For NaN values in these columns, they can be replaced with 0.

In [22]:
Out[22]:
Name Price Platforms platforms_count Rating Authors Genre Tags Number of Reviews Made with tools_count Average session Languages languages_count Inputs inputs_count Accessibility accessibility_count
0 ​Our Life: Beginnings & Always 0.0 ['Windows', 'macOS', 'Linux', 'Android'] 4 5.0 ['GBPatch'] ['Visual Novel', 'Interactive Fiction'] ['amare', 'Comedy', 'Dating Sim', 'Gay', 'LGBT... 3054.0 NaN 0 NaN NaN 0 NaN 0 NaN 0
1 HoloCure 0.0 ['Windows'] 1 5.0 ['Kay Yu'] ['Action'] ['Fangame', 'hololive', 'Pixel Art', 'Roguelit... 2923.0 ['GameMaker: Studio'] 1 About a half-hour ['English', 'Japanese'] 2 ['Keyboard'] 1 NaN 0
2 Friday Night Funkin' 0.0 ['Windows', 'macOS', 'Linux', 'HTML5'] 4 5.0 ['ninjamuffin99', 'PhantomArcade'] ['Rhythm'] ['2D'] 9536.0 ['OpenFL', 'IndieCade', 'Haxe'] 3 About an hour NaN 0 ['Dance pad'] 1 NaN 0
3 Adventures With Anxiety! 0.0 ['HTML5'] 1 5.0 ['Nicky Case!'] ['Visual Novel'] ['Comedy', 'Mental Health', 'Narrative'] 3158.0 NaN 0 NaN NaN 0 NaN 0 NaN 0
4 Butterfly Soup 0.0 ['Windows', 'macOS', 'Linux'] 3 5.0 ['Brianna Lei'] ['Visual Novel', 'Interactive Fiction'] ['2D', 'Anime', 'Female Protagonist', 'LGBT', ... 3276.0 ["Ren'Py"] 1 A few seconds ['Czech', 'Persian', 'Japanese', 'Korean', 'Po... 7 NaN 0 NaN 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
13963 casual carnival (prototype) 0.0 ['Windows', 'HTML5'] 2 5.0 ['olmewe'] ['Card Game', 'Puzzle'] ['2D', 'Arcade', 'Furry', 'pupy'] 7.0 ['Unity'] 1 About a half-hour ['English'] 1 ['Keyboard', 'Mouse'] 2 NaN 0
13964 Cook'em Up 0.0 ['Windows', 'HTML5'] 2 5.0 ['Carbonara', 'Yanni', 'Aredhele', 'Rémi Cros'... ['Shooter'] NaN 7.0 ['Unity'] 1 NaN NaN 0 NaN 0 NaN 0
13965 Super Funky Light Show - Full version 0.0 NaN 0 5.0 ['OwenSenior'] ['Shooter'] NaN 7.0 NaN 0 NaN NaN 0 NaN 0 NaN 0
13966 A World in a Jar 0.0 ['Windows', 'macOS', 'Linux'] 3 5.0 ['Zeknir', 'Traincraft'] ['Puzzle'] ['Casual', 'Ludum Dare 38', 'Pixel Art', 'Sing... 7.0 NaN 0 About a half-hour ['English'] 1 ['Mouse', 'Touchscreen'] 2 ['Interactive tutorial'] 1
13967 TURDLE C64 0.0 ['HTML5'] 1 5.0 ['Roysterini'] ['Puzzle'] ['Commodore 64', 'megastyle', 'Retro', 'silly'... 7.0 NaN 0 NaN NaN 0 NaN 0 NaN 0

13968 rows × 18 columns

Finally, we have the summary of itchio_df. Unfortunately, there are much more NaN values in itchio_df compared to steam_df in the Made with, Average session, Languages, Inputs and Accessibility columns, as these are the columns for the extra information that not every game page displays. As a result, we cannot drop these NaN values.

In [23]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13968 entries, 0 to 13967
Data columns (total 18 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Name                 13968 non-null  object 
 1   Price                13939 non-null  float64
 2   Platforms            13224 non-null  object 
 3   platforms_count      13968 non-null  Int64  
 4   Rating               13946 non-null  float64
 5   Authors              13966 non-null  object 
 6   Genre                12389 non-null  object 
 7   Tags                 13286 non-null  object 
 8   Number of Reviews    13946 non-null  float64
 9   Made with            7943 non-null   object 
 10  tools_count          13968 non-null  Int64  
 11  Average session      6680 non-null   object 
 12  Languages            5470 non-null   object 
 13  languages_count      13968 non-null  Int64  
 14  Inputs               5840 non-null   object 
 15  inputs_count         13968 non-null  Int64  
 16  Accessibility        2416 non-null   object 
 17  accessibility_count  13968 non-null  Int64  
dtypes: Int64(5), float64(3), object(10)
memory usage: 2.0+ MB

EDA

We can find the rise in popularity of indie games and non-indie games by plotting the total concurrent players from indie and non-indie games against time. The total concurrent players in a month can be estimated by the sum of the average concurrent players of every game in that month. A rolling average is used to smoothen out the graph.

In [24]:

Both indie games and non-indie games have an steady increasing trend in the total number of concurrent players from 2013 to 2020. However, non-indie games had a spike in total concurrent players from late 2017 to early 2018, before returning to the normal rate of increase at late 2018. At 2020, the rate of growth of total concurrent players for both indie and non-indie games accelerated, increasing at a faster rate with a slight amount of oscillation.

However, in order to find the rise in popularity of indie games relative to non-indie games, we have to plot the proportion of concurrent players from indie and non-indie games against time, rather than the total number of concurrent players.

In [25]:

The proportion of concurrent players from indie games had a steady increase from around 12% in 2013 to around 22% in 2022. Since the proportion of concurrent players from indie games increased over time, we can imply that indie games has had a greater rate of growth than non-indie games. There was also a small dip in 2018, which is explained by the spike in total concurrent players that non-indie games had.

We can also plot the proportion of concurrent players from indie and non-indie games against time for games with different popularity levels.

In [26]:

The proportion of concurrent players from indie games increased over time, regardless of the popularity level of the games. However, games that were less popular had a greater increase in the proportion of concurrent players from indie games over time.

Therefore, we can infer that the popularity of indie games among gameers has been on the rise and is catching up to the popularity of non-indie games, especially for less popular games.

Next, we can plot the total number of indie and non-indie games released against time.

In [27]:

Both indie and non-indie games had an increasing trend over time. However, the scaling of x-axis is unsuitable since it starts as early as 1970 and there is not much increase in the early years, so we can zoom in on the increasing trend in the 2000s.

In [28]:

After zooming in on the 2000s, we can more clearly see the increasing trends of both indie and non-indie games. The total number of indie games had an exponential growth from 2008 onwards, quickly surpassing the total number of non-indie games in 2015. This exponential growth can be better visualised if we instead plot the proportion of indie and non-indie games released against time.

In [29]:

The proportion of indie games released increased from less than 10% in 2000, to around 75% in 2022. Here, we can clearly see the exponential growth from 2008 onwards, and when the proportion of indie games released reaching 50% at 2015. Due to the exponential growth of the number of indie games released from 2008 onwards, as well as how great the rate of growth indie games have in relative to non-indie games, we can conclude that the demand for indie games and the prevalence of indie games truly started to increase rapidly from 2008 onwards.

According to the graph, we can also see that currently, in 2022, around 75% of the games are indie games. We can confirm this with a pie chart.

In [30]:

We can also plot pie charts of each of the bins, to show the proportion of indie and non-indie games at different popularity levels.

In [31]:

As the popularity of games increased, the proportion of indie games decreased, decreasing from 78.7% in the "Lowest" bin to only 40.3% in the "Highest" bin. This shows that even though indie games has a faster rate of growth than non-indie games, indie games are still not able to overthrow non-indie games in terms of popularity, especially the biggest and most popular ones.

However, there is another way to compare the popularities of indie and non-indie games. By plotting boxplots of the total number of reviews of indie and non-indie games at different levels of popularity, we can infer if indie games are comparable in size and popularity to non-indie games. We have to seperate the different levels of popularity into different boxplots, due to the differences in the y-axis.

In [32]:

Unfortunately, due to the many outliers present above the upper bound, the boxplots are unable to be seen, thus we would need to hide the outliers.

In [33]:

In all 4 bins, the median of indie games are higher than the median of non-indie games. Both the indie games and non-indie games have distributions that are skewed to the right for all 4 bins. The IQR of indie games was larger than that of non-indie games in the "Highest" and "Low" bins, and vice versa for the "High" and "Lowest" bins.

The median of the total number of reviews of indie games are consistently higher than that of non-indie games, thus we can imply that indie games are still comparable in scale and popularity to non-indie games, regardless of the level of popularity.

Q2: What are the major differences between indie games and AAA games?

Firstly, we can compare the quality of indie and non-indie games by plotting the distribution of rating of indie and non-indie games.

In [34]:

Indie games have a higher median than non-indie games. Both indie and non-indie games have distributions that were skewed to the left. Indie games have a smaller IQR than non-indie games. Both indie and non-indie games have outliers below the lower bound.

This pattern is also consistent at different popularity levels.

In [35]:

In all 4 bins, indie games have a higher median than non-indie games, both indie and non-indie games have distributions that were skewed to the left, indie games have a smaller IQR than non-indie games, and both indie and non-indie games have outliers below the lower bound.

It is also worth noting that indie and non-indie games in the "Highest" bin have the greatest difference in rating medians compared to the other 3 bins, which can show that the as the level of popularity increases, the difference in quality between indie and non-indie games becomes larger, where more popular indie games would be much more well-received compared to other non-indie games of around the same popularity.

Regardless of the level of popularity, indie games are overall more enjoyable and more positively received than non-indie games, as seen by the higher median of indie games. There is also less variation in quality in indie games than in non-indie games, shown by the smaller IQR of indie games.

Next, we can compare the length of indie and non-indie games by plotting the distribution of average playtime of indie and non-indie games.

In [36]:

There are many outliers above the upper bounds of indie and non-indie games. However, even if we hide outliers, too many of the values are 0, thus we are unable to get useful boxplots.

In [37]:

If we plot the distribution of average playtime of indie and non-indie games by the level of popularity, then we can get useful boxplots.

In [38]:

Too many of the values are 0 in the "Lowest" bin, thus we are unfortunately unable to use it for any observations. The other bins, however, do allow us to observe some trends.

The median of indie games is lower than the median of non-indie games in the "Highest" and "High" bins, while both medians of indie and non-indie games are 0 in the "Low" bin. Both indie and non-indie games were have distributions that are skewed to the right in all 3 bins. Indie games had a lower IQR than non-indie games in the "Highest" and "High" bins, and was vice versa in the "Low" bin.

We can get similar results if we plot the distribution of median playtime as well.

In [39]:
In [40]:
In [41]:

From these graphs, we can conclude that indie games are overall shorter in length and do not have as much content as non-indie games have, due to indie games having a lower median of playtime. This makes sense due to how indie game developers would have lesser resources and manpower and are unable to create a game as large in scale as a non-indie game.

We can also compare how expensive indie and non-indie games are by plotting the distribution of price of indie and non-indie games.

In [42]:

By hiding the outliers above the upper bounds of both indie and non-indie games, we get the following graph.

In [43]:

The median of indie games is lower than the median of non-indie games. Both indie and non-indie games have a distribution that is skewed to the right. Indie games have a smaller IQR than non-indie games.

We will get similar results if we seperate the games by level of popularity.

In [44]:

By hiding the outliers above the upper bounds of both indie and non-indie games, we get the following graph.

In [45]:

In all 4 bins, the median of indie games is lower than the median of non-indie games and both indie and non-indie games have a distribution that is skewed to the right. Indie games have a smaller IQR than non-indie games for all bins except fot the "Lowest" bin, where the IQRs are equal.

It is also worth noting that as popularity increases, the difference between the medians of indie and non-indie games compared also increases, which can show that more popular non-indie games would be much more expensive than other indie games of around the same popularity.

From the graphs, we can conclude that overall, indie games are cheaper than non-indie games, as shown by indie games having a lower median of price. Non-indie games also have a more diverse range of prices, as shown by the higher IQR of non-indie games.

If a game is of a higher quality and is made more accessible for players, the developers of the game would provide players with more language options within their game. Therefore, we can compare the distribution of the number of languages of indie and non-indie games to compare how accessible indie and non-indie games are, as well as to give a general gaugue on the quality of indie and non-indie games.

In [46]:

The overall shape of both of the distributions of indie and non-indie games are similar, having a peak at 1 language before decreasing in density as the number of languages increases. It is also worth nothing that there is a small spike in density between 25 and 30 languages, which most likely represents the biggest and most popular indie and non-indie games that have many language options.

However, there is a greater density of indie games with less than 5 languages compared to non-indie games, while non-indie games has a greater distribution of the number of languages, having a greater density than indie games when the number of languages is 5 or greater.

Therefore, we can conclude that overall, non-indie games overall have a greater number of language options than indie games, which show that indie games are less accessible than non-indie games. However, it is not necessarily that indie games are poorer in quality, as it could be just that indie games, due to their lack of manpower, cannot find people that are fluent in different languages to provide translations, while non-indie games have many different people working on it, some of which are fluent in other languages.

We can compare the amount of manpower working behind indie and non-indie games by plotting the distribution of the number of developers.

In [47]:

However, since the total numbers of indie and non-indie games are unequal, it is unfair and inaccurate to just compare the distribution of the number of indie and non-indie games as there are many more indie games than non-indie games in the dataset. Therefore, it is more appropriate to plot the distribution of the proportion of indie and non-indie games by the number of developers.

In [48]:

Since the proportion of indie and non-indie games with 6 developers and above, we can sum the proportions of indie and non-indie games with 5 developers and above.

In [49]:

Both indie and non-indie games have the largest proportion of games with only 1 developer. However, the proportion of indie games with only 1 developer is larger than that of non-indie games, while for games with 2 or more developers, the proportion of non-indie games is larger than that of indie games. This shows that overall, non-indie games are more likely to have a greater number of developers.

We can get a similar trend by plotting the distribution of the number of publishers.

In [50]:

By plotting the distribution of the proportion of indie and non-indie games by the number of publishers, we get the following graph.

In [51]:

Both indie and non-indie games have the largest proportion of games with only 1 publisher. However, the proportion of indie games with only 1 publisher is larger than that of non-indie games, while for games with 2 or more publishers, the proportion of non-indie games is larger than that of indie games. This shows that overall, non-indie games are more likely to have a greater number of publishers.

Therefore, indie games would have less manpower behind them than non-indie games, shown by the lesser number of developers and publishers.

We can compare the most popular genres and tags of indie and non-indie games to find out what types of indie and non-indie games are being produced.

Firstly, we need to find the proportions of indie and non-indie games that are in each genre.

In [52]:
Out[52]:
True False
Action 44.760221 36.252275
Casual 43.901278 33.438331
Adventure 41.138609 32.724346
Strategy 19.424803 20.271595
Simulation 19.282023 22.126557
RPG 18.064999 16.890662
Early Access 11.719246 8.021840
Free to Play 7.846070 11.248775
Sports 4.161001 6.418872
Racing 3.474300 4.521910
Massively Multiplayer 2.311667 4.822904
Education 0.033995 0.041999
Utilities 0.024930 0.007000
Design & Illustration 0.015864 0.027999
Web Publishing 0.013598 NaN
Audio Production 0.013598 NaN
Animation & Modeling 0.013598 NaN
Game Development 0.011332 0.014000
Software Training 0.011332 0.014000
Accounting 0.009065 NaN
Video Production 0.009065 0.007000
Movie 0.004533 NaN
Photo Editing 0.004533 NaN
Short 0.002266 NaN
Episodic 0.002266 NaN
Documentary 0.002266 NaN
Tutorial 0.002266 NaN
360 Video 0.002266 NaN
\nUnder S$12\n NaN 0.014000
\nUnder S$6\n NaN 0.014000

Apart from the top 11 genres, the proportions of indie and non-indie games grow very small, less than 0.1%. Therefore, we should only consider the top 11 genres.

In [53]:
Out[53]:
True False
Action 44.760221 36.252275
Casual 43.901278 33.438331
Adventure 41.138609 32.724346
Strategy 19.424803 20.271595
Simulation 19.282023 22.126557
RPG 18.064999 16.890662
Early Access 11.719246 8.021840
Free to Play 7.846070 11.248775
Sports 4.161001 6.418872
Racing 3.474300 4.521910
Massively Multiplayer 2.311667 4.822904

Finally, we can plot the proportions of indie and non-indie games that are in each of the top 11 genres.

In [54]:

The "Action", "Casual" and "Adventure" genres were the 3 top genres for both indie and non-indie games. However, there is a higher proportion of indie games that are in these 3 top genres compared to non-indie games. Other than the top 3 genres, there is also a higher proportion of indie games in the "RPG" and "Early Access" genres, whereas the "Strategy", "Simulation", "Free to Play", "Sports", "Racing" and "Massively Multiplayer" genres have a higher proportion of non-indie games.

This graph shows us that indie games is not as diverse in its genres compared to non-indie games, as seen by a higher proportion of indie games being in top 3 genres instead of having a more even distribution. This can be due to the limitations that indie games face but non-indie games do not, restricting the genre of game indie game developers can produce. For example, games in the "Simulation", "Sports" and "Racing" genres might require a level of realism in terms of graphics and gameplay, which might require more resources and manpower that indie games do not have. Games in the "Strategy" genre might require more complicated and in-depth game mechanics to keep players hooked, while games in the "Massively Multiplayer" genre would require running servers to support multiplayer, both of which might be difficult for an individual to implement if they do not have the prior knowledge and resources.

We can also find the proportions of indie and non-indie games that have each tag.

In [55]:
Out[55]:
True False
"1990s" 2.162089 3.429931
"Beat em up" 1.559242 1.602968
"Shoot Em Up" 4.208594 2.246955
1980s 2.066902 1.434971
2.5D 2.089566 1.924962
... ... ...
World War I 0.151845 0.377992
World War II 0.577917 2.288954
Wrestling 0.074789 0.153997
Zombies 3.141148 2.799944
eSports 0.677636 1.203976

423 rows × 2 columns

Now, we can plot the top 20 tags with the highest proportion of indie or non-indie games.

In [56]:

"Singleplayer" was by far the most popular tag for both indie and non-indie games, with the proportion of non-indie games with the "Singleplayer" tag only being slightly higher than that of indie games. There were also many other tags that appeared in the top 10 tags for both indie and non-indie games, which are the "Multiplayer", "2D", "3D", "Story Rich", "Atmospheric", "Puzzle", "Fantasy", "Anime", "Cute", "Colorful" and "Arcade" tags.

Indie and non-indie games share 12 out of 20 tags for their top 20 tags, therefore these graphs unfortunately do not tell us much about the differences in the types of games indie and non-indie games are. Rather, it just shows the tags that are popular overall.

Therefore, we instead need to find the top 20 tags with the greatest ratio of indie games to non-indie games, and vice versa.

In [57]:

Now, we can finally see some patterns in the types of games being developed as indie and non-indie games.

For indie games, "Short" is the top tag by quite a margin. This makes sense, as indie game developers usually do not have the resources or manpower to create extremely long games with a lot of content. However, some of the other tags do give us an idea of how indie game developers solve these problems. For example, some indie games contain some sort of procedural generation, which is a algorithmic process of generating gameplay. This can allow the gameplay to feel fresh and unrepetitive without the need for the human touch, increasing the replay value of indie games. As it turns out, "Procedural Generation" and "Replay Value" are both included in the top 20 tags for indie games. Some examples of games that use procedural generation are roguelikes and roguelites, which also both appear as the tags "Roguelike" and "Roguelite". Some indie games can also make gameplay more fun is by making it more difficult or fast-paced and requiring time to master, which can explain the tags "Difficult" and "Fast-Paced". Some of these games can include "Bullet Hell", "Top-Down Shooter" and "Shoot Em Up", which also appear as tags. Finally, platformers and puzzle games are quite popular among indie games, with the tags "Puzzle Platformer", "Platformer", "Logic" and "Puzzle" all appearing in the top 20. This can be due to puzzle and platformers usually having simpler types of gameplay than other types of games.

On the other hand, for non-indie games, "Classic" is the top tag by quite a margin. This can be due to many non-indie games that are seen as classics or having recognisable characters in them. There are also some types of games that are more complicated and require more resources and manpower. There are the "Historical", "Military", "War" and "Driving" tags, where these games have to as realistic as possible, making them complex to develop. There are the "RTS", "JRPG", "Turn-Based Strategy" and "Tactical" tags, which have to have some in-depth strategy and enough balancing to create interesting gameplay. There are also games with the "Open World" tag that are usually at very large scales, as they have to incentivise players to explore a world that has to be large enough. Lastly, there are the "Multiplayer", "Online Co-Op", "VR", "PvP" and "Co-op" tags, which would require external software, such as servers and VR headsets, in order to run.

Therefore, these graphs and tags show that in order to combat the lack of resources and manpower, there are some patterns that emerge among indie games, such as making gameplay more unique or interesting, as well as sticking to types of games that are easier to develop over other types that can be more difficult to develop.

Q3: What factors contribute to the success of an indie game?

Firstly, we need to filter out only all of the indie games in the data set.

In [58]:
Out[58]:
id name is_indie total_reviews positive_reviews negative_reviews rating owners min_owners max_owners ... date developers developers_count publishers publishers_count languages languages_count genres tags owners_binned
1 1118200 People Playground True 129491 128053 1438 98.89 3500000.0 2000000.0 5000000.0 ... 2019-07-23 ['mestiez'] 1 ['Studio Minus'] 1 ['English'] 1 ['Action', 'Casual', 'Indie', 'Simulation'] ['Sandbox', 'Physics', 'Gore', 'Violent', 'Mod... Highest
2 1794680 Vampire Survivors True 114325 113067 1258 98.90 3500000.0 2000000.0 5000000.0 ... 2021-12-17 ['poncle'] 1 ['poncle'] 1 ['English'] 1 ['Action', 'Casual', 'Indie', 'RPG', 'Early Ac... ['Action Roguelike', 'Pixel Graphics', 'Bullet... Highest
3 1145360 Hades True 194006 191332 2674 98.62 7500000.0 5000000.0 10000000.0 ... 2020-09-17 ['Supergiant Games'] 1 ['Supergiant Games'] 1 ['English', 'French', 'Italian', 'German', 'Sp... 11 ['Action', 'Indie', 'RPG'] ['Action Roguelike', 'Indie', 'Roguelite', 'Ac... Highest
4 413150 Stardew Valley True 485586 476588 8998 98.15 15000000.0 10000000.0 20000000.0 ... 2016-02-26 ['ConcernedApe'] 1 ['ConcernedApe'] 1 ['English', 'German', 'Spanish - Spain', 'Japa... 12 ['Indie', 'RPG', 'Simulation'] ['Farming Sim', 'Life Sim', 'Pixel Graphics', ... Highest
5 105600 Terraria True 987441 966348 21093 97.86 35000000.0 20000000.0 50000000.0 ... 2011-05-16 ['Re-Logic'] 1 ['Re-Logic'] 1 ['English', 'French', 'Italian', 'German', 'Sp... 9 ['Action', 'Adventure', 'Indie', 'RPG'] ['Open World Survival Craft', 'Sandbox', 'Surv... Highest
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
58402 1154790 XIII True 922 133 789 14.43 1500000.0 1000000.0 2000000.0 ... 2020-11-10 ['PlayMagic'] 1 ['Microids'] 1 ['English', 'French', 'Italian', 'Spanish - Sp... 5 ['Action'] ['FPS', 'Shooter', 'Stealth', 'PvP', 'Action',... Highest
58403 369080 Age of Survival True 229 26 203 11.35 10000.0 0.0 20000.0 ... 2015-08-25 ['Seattletek'] 1 ['Seattletek'] 1 ['English', 'German', 'Russian'] 3 ['Action', 'Adventure', 'Indie', 'Simulation',... ['Survival', 'Simulation', 'Early Access', 'Op... Lowest
58406 210490 Fray True 50 1 49 2.00 10000.0 0.0 20000.0 ... 2012-06-19 ['Brain Candy'] 1 ['Brain Candy'] 1 ['English'] 1 ['Action', 'Strategy', 'Indie'] ['Strategy', 'Action', 'Indie'] Lowest
58407 257930 Race To Mars True 261 20 241 7.66 10000.0 0.0 20000.0 ... 2014-03-07 ['INTERMARUM', 'ONE MORE LEVEL'] 2 ['ONE MORE LEVEL'] 1 ['English', 'Polish'] 2 ['Indie', 'Simulation', 'Strategy', 'Early Acc... ['Strategy', 'Simulation', 'Indie', 'Space', '... Lowest
58409 397760 Urban War Defense True 79 2 77 2.53 10000.0 0.0 20000.0 ... 2017-07-07 ['Budgie Games'] 1 ['Budgie Games'] 1 ['English'] 1 ['Action', 'Indie', 'Strategy', 'Early Access'] ['Action', 'Indie', 'Early Access', 'Strategy'... Lowest

44124 rows × 23 columns

We can compare the quality of indie games of different popularities by plotting the distribution of rating of indie games of different popularities.

In [59]:

All 4 bins have distributions that were skewed to the left and have outliers below the lower bound. As popularity increases, the median also increases, and the IQR increases as well.

Therefore, more popular indie games are more enjoyable and better received, as shown by the higher medians of rating, as well as being more consistent in quality, as shown by the lower IQRs.

Next, we can compare the length of indie games of different popularities by plotting the distribution of average playtime of indie games of different popularities.

In [60]:

By hiding the outliers above the upper bounds of all bins, we get the following graph.

In [61]:

As popularity decreases, the median decreases, and the IQR decreases as well. All 4 bins had distributions skewed to the right.

Similar results are shown by plotting the distribution of median playtime of indie games of different popularities.

In [62]:
In [63]:

Therefore, as more popular indie games would be longer and have more content, as shown by the higher medians of playtime, which can mean that they are of a larger scale.

We can also compare how expensive indie games of different popularities are by plotting the distribution of price of indie games of different popularities.

In [64]:

By hiding the outliers above the upper bounds of both indie and non-indie games, we get the following graph.

In [65]:

As popularity increases, the median also increases, and the IQR increases as well. All 4 bins have distributions that were skewed to the right.

Therefore, more popular indie games are more expensive, as shown by the higher medians of price.

If a game is of a higher quality and is made more accessible for players, the developers of the game would provide players with more language options within their game. Therefore, we can compare the distribution of the number of languages of indie games of different popularities to compare how accessible indie games of different popularities are, as well as to give a general gaugue on the quality of indie games of different popularities.

In [66]:

As popularity increases, the density of indie games with less than 5 languages decreases, while the density of indie games when the number of languages is 5 or greater increases. While "High", "Low" and "Lowest" had similar shapes, having a peak at 1 language before decreasing in density as the number of languages increases, "Highest" had a completely different shape, having a much more distributed shape with a maximum density at around 10 languages.

Therefore, we can conclude that overall, as popularity increases, the number of languages increases. This implies that indie games that are more popular would be of a higher quality and are more accessible for players.

We can compare the most popular genres and tags of indie games of different popularities to find out what types of indie games are being produced.

Firstly, we need to find the proportions of indie games of different popularities that are in each of the same top 11 genres as before.

In [67]:
Out[67]:
Highest High Low Lowest
Action 59.798995 46.625937 44.748511 44.446242
Adventure 40.703518 44.154402 43.810972 40.015535
Casual 20.100503 32.463205 38.350437 46.876821
Strategy 24.120603 26.076090 21.170658 18.282737
Simulation 25.879397 22.382671 17.914608 19.370186
RPG 23.115578 23.215773 22.855695 16.169331
Free to Play 20.854271 16.606498 12.504751 2.916046
Early Access 8.040201 8.969731 9.742810 12.486245
Massively Multiplayer 9.045226 5.331852 2.761941 1.634410
Sports 3.266332 3.360178 3.268719 4.566639
Racing 1.507538 3.110247 3.078677 3.673377

We can then plot a heatmap showing the proportions of indie games of different popularities that are in each of the top 11 genres.

In [68]:

The "Action" tag had the highest proportion of games in all 4 bins. As popularity increases, the "Action", "Strategy", "Simulation", "RPG", "Free to Play" and "Massively Multiplayer" genres have an increasing trend, the "Casual", "Early Access", "Sports" and "Racing" genres have a decreasing trend, and the "Adventure" genre had no obvious trend.

Unfortunately, due to the results being so mixed, no obvious pattern emerges, unlike when indie and non-indie games were being compared.

We can also compare the median of ratings at different popularity levels for each of the top 11 genres.

In [69]:
Out[69]:
Highest High Low Lowest
Casual 89.990 82.780 78.740 80.000
Adventure 91.350 83.060 78.180 78.015
RPG 90.085 81.525 77.560 76.920
Action 88.905 80.910 76.930 79.310
Early Access 90.935 80.900 76.280 76.615
Racing 90.470 82.340 75.000 75.000
Sports 88.760 81.510 76.530 75.000
Simulation 91.860 82.655 75.020 72.000
Strategy 89.070 80.970 75.000 75.470
Free to Play 83.960 81.100 76.820 73.950
Massively Multiplayer 74.790 68.260 65.665 62.860

We can now plot a heatmap showing the median of ratings of indie games at different popularity levels for each of the top 11 genres.

In [70]:

Not a lot of patterns emerge, as the main pattern of the median increasing as popularity increases is consistent for all genres. However, there are two outliers, where the "Free to Play" genre has a significantly lower median for the "Highest" bin, and the "Massively Multiplayer" genre has a significantly lower median for all 4 bins. This could be due to very popular free games not being as good of quality as other games with similar popularities, and massively multiplayer games are just too difficult for indie developers to develop successfully and effectively.

Finally, we can compare the median of prices at different popularity levels for each of the top 11 genres.

In [71]:
Out[71]:
Highest High Low Lowest
Early Access 17.49 9.99 6.99 7.99
Simulation 16.99 9.99 6.99 4.99
Strategy 16.49 9.99 5.99 4.99
RPG 14.99 9.99 5.99 4.99
Adventure 14.99 9.99 4.99 4.99
Action 14.99 7.99 4.99 4.99
Racing 14.99 4.99 3.99 4.99
Sports 1.99 4.99 4.99 4.99
Casual 4.99 2.99 2.99 3.99
Massively Multiplayer 0.00 0.00 0.00 4.99
Free to Play 0.00 0.00 0.00 0.00

We can now plot a heatmap showing the median of prices of indie games at different popularity levels for each of the top 11 genres.

In [72]:

Similarly, not a lot of patterns emerge, as the main pattern of the median increasing as popularity increases is consistent for all genres, except for 4 genres. These 4 genres are the "Sports" and "Massively Multiplayer" genres, which has a decreasing trend, as well as the "Casual" and "Free to Play" genres, with no clear trend.

Unfortunately, the genres are unable to give us any new insight, unlike when comparing indie and non-indie games.

Next, we can find the proportions of indie games of different popularities that have each tag.

In [73]:
Out[73]:
Highest High Low Lowest
"1990s" 1.507538 1.971675 2.103129 2.110169
"Beat em up" 2.010050 1.805054 1.976435 1.394912
"Shoot Em Up" 0.753769 4.137740 5.346510 3.954949
1980s 1.507538 1.221883 2.153807 2.071331
2.5D 1.256281 2.915857 2.660585 1.754159
... ... ... ... ...
World War I 0.251256 0.305471 0.304067 0.097094
World War II 1.758794 0.944182 0.937540 0.440158
Wrestling 0.251256 0.055540 0.076017 0.071202
Zombies 9.045226 4.804221 3.585455 2.757460
eSports 1.005025 1.332963 0.671481 0.521069

421 rows × 4 columns

Now, we can plot a heatmap of the top 20 tags with the highest proportion of indie games of different popularities.

In [74]:

"Singleplayer" was again the most popular tag for both indie games from all 4 bins, having an increasing trend when popularity increases. However, similarly to when we compared indie and non-indie games, not many observations can be made, as many of the same tags can be visible in the top 20 tags of most or all 4 bins, which can be linked back to these tags just simply being the most popular overall.

Therefore, for each of the 4 bins, we instead need to find the top 20 tags with the greatest ratio of games from that specific bin to the sum of games from other bins.

In [75]:

There are some patterns that emerge among games of different popularities.

For the "Highest" bin, games that require more manpower and resources to develop appear. For example, many of the tags are tags relating to multiplayer games, which require servers to run and can be difficult to set up without prior knowledge or resources, show up, such as the "Online Co-Op", "Co-Op", "Competitive", "Multiplayer", and "PvP" tags. Open world and sandbox games, which are much larger in scale compared to other types of games to incentivise players to explore and be creative in a world that has to be big enough, are also more prevalent among indie games in this bin, as seen by the "Open World Survival Craft", "Sandbox" and "Open World" tags. This is somewhat similar to what happened with non-indie games when we compared indie and non-indie games.

On the other hand, for the "Lowest" bin, platformers were more popular, as seen by the "2D Platformer", "Precision Platformer" and "3D Platformer" tags. This could be due to how platformers have simpler types of gameplay that can be easier to develop than other games, similar to what happened with indie games when we compared indie and non-indie games.

Other than that, there were no other trends that were really of note in the other bins. There were only types of games that were more popular in each bin for some unknown reason. For the "High" bin, turn-based games ("Turn-Based", "JRPG", "Turn-Based Combat", "Turn-Based Tactics", "Turn-Based Strategy) were more popular. For the "Low" bin, RPGs ("RPG Maker", "JRPG", "Turn-Based Tactics", "Turn-Based Combat") and interactive fiction games ("Visual Novel", "Interactive Fiction", "Multiple Endings", "Choose Your Own Adventure") were more popular.

Finally, we can find how the rating, playtime, price, number of languages and number of developers of a game affects the popularity of a game. To do this, we can find the correlation coefficient and P-value that rating, average playtime, price, number of languages and number of developers have with the total number of reviews.

In [76]:
Out[76]:
pearson_coef p_value
rating 0.044425 4.296009e-20
avg_playtime 0.190546 0.000000e+00
price 0.079920 2.102898e-61
languages_count 0.087947 5.124589e-74
developers_count 0.005183 2.843592e-01

The average playtime had the highest correlation coefficient, followed by number of languages, price, rating and finally number of developers. While the P-values of average playtime, number of languages, price and rating were very small values, the number of values did not.

Since the number of developers had a low correlation coefficient and a high P-value, we can conclude that the number of developers has no correlation with the total number of reviews and does not contribute to the popularity of a game. On the other hand, average playtime has the strongest correlation with the total number of reviews and contributes the most to the popularity of a game, followed by number of languages, price and finally rating.

We can also find the correlation coefficient that rating, average playtime, price, number of languages have with the total number of reviews at different levels of popularity.

In [77]:
Out[77]:
rating avg_playtime price languages_count
Highest 0.170095 0.628458 0.126055 0.233734
High 0.305308 0.163399 0.312217 0.247698
Low 0.205851 0.041710 0.325715 0.227636
Lowest 0.019467 0.056028 0.011327 0.022582

We can now plot a graph of the correlation coefficient that rating, average playtime, price, number of languages have with the total number of reviews at different levels of popularity, to show the changes in correlation coefficient as popularity changes.

In [78]:

In the "Lowest" bin, the correlation coefficients of all 4 variables were low, showing that the correlation of these variables might not be as strong with the least popular games.

From the "Highest" bin to the "Low" bin, average playtime had a decreasing trend, price had an increasing trend, and rating and number of languages did not have any obvious trend. Average playtime had a stronger correlation with the total number of reviews and contributes more to the popularity of a game for games that are more popular. This can possibly be due to how if every game is high in quality, a successful game has to have more content and be larger in scale for it to stand out against its competition and succeed. On the other hand, price had a stronger correlation with the total number of reviews and contributes more to the popularity of a game for games that are less popular. This can possibly be due to how if a game is lacking in terms of content and quality, as a lot of less popular games are, the price would be the biggest factor when deciding which game players will purchase and which games will succeed, whereas if a game had more content and higher in quality, as a lot of more popular games are, any reasonable enough price would be suitable. Finally, rating and the number of languages had a constant correlation with the total number of reviews and the popularity if a game does not affect how much they contribute to the popularity of a game. This can be due to how the quality and accessibility of a game are useful factors in creating a succesful game, regardless how popular the game is.

We can confirm this by training linear regression models and see if they fit the data. The models are trained in groups of 20, with random_state ranging from 0 to 19, and the best models are picked by hand.

In [79]:

We can better see the accuracy of the linear regression model by adjusting the axes.

In [80]:

According to the kdeplot, this linear regression model does somewhat accurately represent the data as the shape of the fitted value is similar to the shape of the actual value. However, the linear regression model does become less accurate for games with a higher number of total reviews, as seen by both the kdeplot and scatterplot. The number of languages had the largest coefficient and contributes the most to the popularity of a game, followed by average playtime, price and lastly rating.

In [81]:

According to the kdeplot, this linear regression model does somewhat accurately represent the data as the shape of the fitted value is similar to the shape of the actual value. Interestingly, the coefficient of price is negative, compared to the other coefficients that are positive. This makes sense, as the lower the price, the more popular and successful the game would be. The average playtime had the largest magnitude of coefficient and contributes the most to the popularity of a game, followed by number of languages, rating, and lastly price.

In [82]:

According to the kdeplot, this linear regression model does not accurately represent the data as the shape of the fitted value is not similar to the shape of the actual value. However, according to the scatterplot, the linear regression model does follow the overall trend of the data points, thus it can still be used for analysis. The number of languages had the largest coefficient and contributes the most to the popularity of a game, followed by price, rating and lastly average playtime.

In [83]:

According to the kdeplot, this linear regression model does not accurately represent the data as the shape of the fitted value is not similar to the shape of the actual value. However, according to the scatterplot, the linear regression model does follow the overall trend of the data points, thus it can still be used for analysis. The price had the largest coefficient and contributes the most to the popularity of a game, followed by number of languages, average playtime and lastly rating.

In [84]:

We can better see the accuracy of the linear regression model by adjusting the axes.

In [85]:

According to the kdeplot, this linear regression model does not accurately represent the data as the shape of the fitted value is not similar to the shape of the actual value. The scatterplot also shows that the predicted values from the linear regression model does not follow the trend of the actual values very well. Therefore, we can conclude that the 4 variables will have a weaker correlation with the number of total reviews and a lesser effect on the success of a game. Therefore, we cannot use this linear regression model for analysis.

Q4: What are factors that indie game developers have to consider when developing an indie game?

Firstly, we can analyse if developers view their game as just a passion project, or an actual legitimate game. We can do this by plotting the proportion of itch.io games that are free and paid.

In [86]:

Majority of the itch.io games are free, thus we can conclude that indie game development is still widely seen as a hobby, rather than an actual career path to make money.

We can also plot the proportion of itch.io games by length.

In [87]:

Most of the itch.io games are very short, with a majority of games only lasting a few minutes, thus we can conclude that most itch.io games are made as short passion projects, either to improve their skills or as a hobby, rather than actual long games with a lot of content.

Next, we can compare the most popular genres and tags of itch.io games to find out what types of itch.io games are being produced.

Firstly, we need to find the number of itch.io games that are in each genre.

In [88]:
Out[88]:
Adventure              2977
Visual Novel           2363
Puzzle                 1983
Action                 1881
Interactive Fiction    1631
Platformer             1631
Simulation             1096
Role Playing           1023
Shooter                 731
Survival                639
Strategy                461
Rhythm                  174
Card Game               155
Fighting                155
Educational             148
Racing                  146
Sports                  112
dtype: int64

We can now plot the number of itch.io games that are in each of the genres.

In [89]:

Many of the genres that were popular among Steam indie games were also popular among itch.io games, such as the "Adventure", "Puzzle", "Action" and "Platformer" genres. This confirms that these genres are truly popular to develop among all indie games, and not just Steam indie games or itch.io indie games.

What is worth noting is that interactive fiction games were popular among itch.io games, as seen by the "Visual Novel" and "Interactive Fiction" genres. This is similar to how interactive fiction games were also popular among Steam indie games in the "Low" bin. This could perhaps mean that indie games in itch.io are the most similar to Steam indie games in the "Low" bin.

We can also find the number of itch.io games that have each tag.

In [90]:
Out[90]:
2D              3787
Pixel Art       3266
Singleplayer    3163
Short           2341
Horror          2240
                ... 
remastered         1
carcosa            1
soft-body          1
microscopic        1
megastyle          1
Length: 6639, dtype: int64

We can now plot the number of itch.io games that have each tag.

In [91]:

Similar to the genres, many of the tags that were popular among Steam indie games were also popular among itch.io games, such as the "2D", "Pixel Art", "Singleplayer", "Short", "3D" and "Cute" tags. This confirms that these tags are truly popular to develop among all indie games, and not just Steam indie games or itch.io indie games.

Therefore, we can conclude that the patterns in the types of indie games produced we observed in the Steam games is also consistent in itch.io games, thus the decisions and solutions that indie game developers use to overcome the limitations in resources and manpower on the commercial scale are also applicable in general indie game development, regardless if it is to create a legitimate game or if it is just as a hobby or passion project.

We can analyse how useful are external tools and software in indie game development by plotting the proportion of itch.io games by number of tools and software used. These tools and software are software that can help in the development of indie games in many different areas, such as programming, graphics and sound.

In [92]:

As the number of tools and software increases, the proportion of itch.io games decreases. However, the proportion of itch.io games that used 1 tool was almost completely equal to the proportion of itch.io games that used no tools, being only 0.1% lesser. There is also a greater proportion of itch.io games that used at least 1 tool compared to itch.io games that used no tools at all. This shows that the use of external tools and software in indie game development is useful among itch.io games and is external tools and software are used quite frequently in indie game development.

We can also analyse the most popular tools and software being used in indie game development by finding the number of itch.io games that use each tool.

In [93]:
Out[93]:
Unity                2591
Bitsy                 838
"RenPy"               769
GameMaker: Studio     753
Adobe Photoshop       561
                     ... 
Amulet                  1
REXPaint                1
CryEngine               1
OGRE                    1
Pixel Vision 8          1
Length: 95, dtype: int64

We can now plot the top 20 tools and softwares among itch.io games.

In [94]:

Unity was the most popular tool by far, having a far greater number of itch.io games than Bitsy, the tool with the second most games. This makes sense, as Unity is widely considered to be the best game development software for beginners due to how easy it is to use, thus it will be perfect for inexperienced indie game developers.

Out of the top 20 tools and softwares, 12 were game engines and were for programming, 6 were tools that can be used to make art assets and were for graphics, and 2 were for creating audio and sound effects. The tools and software for game engines and programming are Unity, Bitsy, RenPy, GameMaker: Studio, Twine, Construct, PICO-8, Godot, RPG Maker, Unreal Engine, OpenFL and PuzzleScript. The tools and software for graphics and art are Adobe Photoshop, Aseprite, Blender, GIMP, Clip Studio Paint and Paint.net. Lastly, the tools and software for audio and sound effects are Audacity and FL Studio.

Therefore, we can conclude that tools and softwares are very useful in indie game development in many different areas, such as programming, art and audio. This is especially true if the software is easy for indie game developers to use, such as Unity. As a result, external tools and softwares are a very integral part of indie development and many indie games use the to combat the limitations from a lack of resources and manpower.

Finally, we can analyse how accessible itch.io games are by finding the proportions of itch.io games by the number of platforms, the number of languages, the number of inputs and the number of accessibility options.

In [95]:

While there was an overall decreasing trend in the proportion of itch.io games as the number of platforms increases, the majority of itch.io games supported 1 platform, instead of the proportion of itch.io games that supported no platforms. There was also a greater proportion of itch.io games that supported 3 platforms compared to the proportion of itch.io games that supported 2 platforms.

To find the reason for these trends, we can find the number of itch.io games that support each platform.

In [96]:
Out[96]:
Windows    9337
HTML5      5926
macOS      5152
Linux      3863
Android     615
Flash        51
Unity        30
dtype: int64

We can now plot the number of itch.io games that support each platform.

In [97]:

The most popular platform was Windows, with HTML5, macOS, and Linux also having a large proportion of itch.io games. This can be due to how Windows, macOS and Linux are the most popular computer operating systems, while HTML5 is a massively popular markup language that is already used for structuring and presenting content thoughout the Internet. This can also be why there is a greater proportion of itch.io games that supported 3 platforms compared to the proportion of itch.io games that supported 2 platforms, as it is likely that the itch.io games that supported 3 platforms supported the 3 most popular computer operating systems, Windows, macOS, and Linux.

Therefore, we can conclude that other than the most popular computer operating systems and HTML5, which is already a popular markup language on the Internet, not many other platforms are supported by itch.io games.

In [98]:
In [99]:
In [100]:

The number of languages, the number of inputs and the number of accessibility options had the same trend, where the proportion of indie games decreased as the number of languages, the number of inputs and the number of accessibility options increased. This can either be due to indie game either developers not having enough resources or manpower to implement these options to make their games more accessible or not listing these extra information on the store page, which is where this data was scraped from. Regardless of the reason, we can conclude that itch.io indie games are not very accessible.

Results Findings & Conclusion

We can find the rise in popularity of indie games and non-indie games by plotting the total concurrent players from indie and non-indie games against time. The total concurrent players in a month can be estimated by the sum of the average concurrent players of every game in that month. A rolling average is used to smoothen out the graph.

In [101]:

Both indie games and non-indie games have an steady increasing trend in the total number of concurrent players from 2013 to 2020. However, non-indie games had a spike in total concurrent players from late 2017 to early 2018, before returning to the normal rate of increase at late 2018. At 2020, the rate of growth of total concurrent players for both indie and non-indie games accelerated, increasing at a faster rate with a slight amount of oscillation.

However, in order to find the rise in popularity of indie games relative to non-indie games, we have to plot the proportion of concurrent players from indie and non-indie games against time, rather than the total number of concurrent players.

In [102]:

The proportion of concurrent players from indie games had a steady increase from around 12% in 2013 to around 22% in 2022. Since the proportion of concurrent players from indie games increased over time, we can imply that indie games has had a greater rate of growth than non-indie games. There was also a small dip in 2018, which is explained by the spike in total concurrent players that non-indie games had.

We can also plot the proportion of concurrent players from indie and non-indie games against time for games with different popularity levels.

In [103]:

The proportion of concurrent players from indie games increased over time, regardless of the popularity level of the games. However, games that were less popular had a greater increase in the proportion of concurrent players from indie games over time.

Therefore, we can infer that the popularity of indie games among players has been on the rise and is catching up to the popularity of non-indie games, especially for less popular games.

Next, we can plot the total number of indie and non-indie games released against time.

In [104]:

Both indie and non-indie games had an increasing trend over time. The total number of indie games had an exponential growth from 2008 onwards, quickly surpassing the total number of non-indie games in 2015. This exponential growth can be better visualised if we instead plot the proportion of indie and non-indie games released against time.

In [105]:

The proportion of indie games released increased from less than 10% in 2000, to around 75% in 2022. Here, we can clearly see the exponential growth from 2008 onwards, and when the proportion of indie games released reaching 50% at 2015. Due to the exponential growth of the number of indie games released from 2008 onwards, as well as how great the rate of growth indie games have in relative to non-indie games, we can conclude that the demand for indie games and the prevalence of indie games truly started to increase rapidly from 2008 onwards.

However, there is another way to compare the popularities of indie and non-indie games. By plotting boxplots of the total number of reviews of indie and non-indie games at different levels of popularity, we can infer if indie games are comparable in size and popularity to non-indie games. We have to seperate the different levels of popularity into different boxplots, due to the differences in the y-axis.

In [106]:

In all 4 bins, the median of indie games are higher than the median of non-indie games. Both the indie games and non-indie games have distributions that are skewed to the right for all 4 bins. The IQR of indie games was larger than that of non-indie games in the "Highest" and "Low" bins, and vice versa for the "High" and "Lowest" bins.

The median of the total number of reviews of indie games are consistently higher than that of non-indie games, thus we can imply that indie games are still comparable in scale and popularity to non-indie games, regardless of the level of popularity.

To summarise, the popularity of indie games among players has been on the rise and is catching up to the popularity of non-indie games, where more and more players are starting to play indie games as time goes on. This is especially true for less popular games, where the rise in popularity is more rapid. As a result, the demand of indie games has risen exponentially, with many more indie games getting released in the present than in the past. This increasing trend started at around 2008, where the number of indie games released started to grow exponentially. While non-indie games might still have more players than indie games, indie games are still comparable in scale and popularity to non-indie games, regardless of the level of popularity.

Q2: What are the major differences between indie games and AAA games?

Firstly, we can compare the quality of indie and non-indie games by plotting the distribution of rating of indie and non-indie games.

In [107]:

Indie games have a higher median than non-indie games. Both indie and non-indie games have distributions that were skewed to the left. Indie games have a smaller IQR than non-indie games. Both indie and non-indie games have outliers below the lower bound.

This pattern is also consistent at different popularity levels.

In [108]:

In all 4 bins, indie games have a higher median than non-indie games, both indie and non-indie games have distributions that were skewed to the left, indie games have a smaller IQR than non-indie games, and both indie and non-indie games have outliers below the lower bound.

It is also worth noting that indie and non-indie games in the "Highest" bin have the greatest difference in rating medians compared to the other 3 bins, which can show that the as the level of popularity increases, the difference in quality between indie and non-indie games becomes larger, where more popular indie games would be much more well-received compared to other non-indie games of around the same popularity.

Regardless of the level of popularity, indie games are overall more enjoyable and more positively received than non-indie games, as seen by the higher median of indie games. There is also less variation in quality in indie games than in non-indie games, shown by the smaller IQR of indie games.

We can plot the proportions of indie and non-indie games that are in each of the top 11 genres to see what types of indie and non-indie games are being developed.

In [109]:

The "Action", "Casual" and "Adventure" genres were the 3 top genres for both indie and non-indie games. However, there is a higher proportion of indie games that are in these 3 top genres compared to non-indie games. Other than the top 3 genres, there is also a higher proportion of indie games in the "RPG" and "Early Access" genres, whereas the "Strategy", "Simulation", "Free to Play", "Sports", "Racing" and "Massively Multiplayer" genres have a higher proportion of non-indie games.

This graph shows us that indie games is not as diverse in its genres compared to non-indie games, as seen by a higher proportion of indie games being in top 3 genres instead of having a more even distribution. This can be due to the limitations that indie games face but non-indie games do not, restricting the genre of game indie game developers can produce. For example, games in the "Simulation", "Sports" and "Racing" genres might require a level of realism in terms of graphics and gameplay, which might require more resources and manpower that indie games do not have. Games in the "Strategy" genre might require more complicated and in-depth game mechanics to keep players hooked, while games in the "Massively Multiplayer" genre would require running servers to support multiplayer, both of which might be difficult for an individual to implement if they do not have the prior knowledge and resources.

Therefore, there are some types of games that would be more difficult to produce and implement, which indie games tend to stay away from due to limitations in resources and manpower.

However, indie game developers are able to deal with these limitations in resources and manpower in ingenious ways. We can see how by finding the top 20 tags with the greatest ratio of indie games to non-indie games, and vice versa.

In [110]:

Now, we can finally see some patterns in the types of games being developed as indie and non-indie games.

For indie games, "Short" is the top tag by quite a margin. This makes sense, as indie game developers usually do not have the resources or manpower to create extremely long games with a lot of content. However, some of the other tags do give us an idea of how indie game developers solve these problems. For example, some indie games contain some sort of procedural generation, which is a algorithmic process of generating gameplay. This can allow the gameplay to feel fresh and unrepetitive without the need for the human touch, increasing the replay value of indie games. As it turns out, "Procedural Generation" and "Replay Value" are both included in the top 20 tags for indie games. Some examples of games that use procedural generation are roguelikes and roguelites, which also both appear as the tags "Roguelike" and "Roguelite". Some indie games can also make gameplay more fun is by making it more difficult or fast-paced and requiring time to master, which can explain the tags "Difficult" and "Fast-Paced". Some of these games can include "Bullet Hell", "Top-Down Shooter" and "Shoot Em Up", which also appear as tags. Finally, platformers and puzzle games are quite popular among indie games, with the tags "Puzzle Platformer", "Platformer", "Logic" and "Puzzle" all appearing in the top 20. This can be due to puzzle and platformers usually having simpler types of gameplay than other types of games.

On the other hand, for non-indie games, "Classic" is the top tag by quite a margin. This can be due to many non-indie games that are seen as classics or having recognisable characters in them. There are also some types of games that are more complicated and require more resources and manpower. There are the "Historical", "Military", "War" and "Driving" tags, where these games have to as realistic as possible, making them complex to develop. There are the "RTS", "JRPG", "Turn-Based Strategy" and "Tactical" tags, which have to have some in-depth strategy and enough balancing to create interesting gameplay. There are also games with the "Open World" tag that are usually at very large scales, as they have to incentivise players to explore a world that has to be large enough. Lastly, there are the "Multiplayer", "Online Co-Op", "VR", "PvP" and "Co-op" tags, which would require external software, such as servers and VR headsets, in order to run.

Therefore, these graphs and tags show that in order to combat the lack of resources and manpower, there are some patterns that emerge among indie games, such as making gameplay more unique or interesting, as well as sticking to types of games that are easier to develop over other types that can be more difficult to develop.

To summarise, indie game developers face many restrictions and limitations when developing their games due to a lack of resources and manpower. One such restriction is not being able to produce types of games that are more difficult to produce and implement, which can be difficult due to requiring hyperrealistic graphics and gameplay, requiring complex and in-depth game mechanics to create intersting gameplay and keep the player hooked, being too large in scale, or requiring external software, such as servers and VR headsets to run. However, indie game developers are able to deal with these limitations in resources and manpower in ingenious ways. Some of these ways include making gameplay more unique and interesting, either through procedural generation or through difficult or fast-paced gameplay that requires time to master, or by making games that are simpler to develop. As a result, even with the difference in resources and manpower, indie games can still be of the same quality and non-indie games, perhaps even of a higher quality. Indie games can be just as enjoyable and positively received as non-indie games.

Q3: What factors contribute to the success of an indie game?

If a game is of a higher quality and is made more accessible for players, the developers of the game would provide players with more language options within their game. Therefore, we can compare the distribution of the number of languages of indie games of different popularities to compare how accessible indie games of different popularities are, as well as to give a general gaugue on the quality of indie games of different popularities.

In [111]:

As popularity increases, the density of indie games with less than 5 languages decreases, while the density of indie games when the number of languages is 5 or greater increases. While "High", "Low" and "Lowest" had similar shapes, having a peak at 1 language before decreasing in density as the number of languages increases, "Highest" had a completely different shape, having a much more distributed shape with a maximum density at around 10 languages.

Therefore, we can conclude that overall, as popularity increases, the number of languages increases. This implies that indie games that are more popular would be of a higher quality and are more accessible for players.

We can find how the rating, playtime, price, number of languages and number of developers of a game affects the popularity of a game. To do this, we can find the correlation coefficient that rating, average playtime, price, number of languages have with the total number of reviews at different levels of popularity.

In [112]:

In the "Lowest" bin, the correlation coefficients of all 4 variables were low, showing that the correlation of these variables might not be as strong with the least popular games.

From the "Highest" bin to the "Low" bin, average playtime had a decreasing trend, price had an increasing trend, and rating and number of languages did not have any obvious trend. Average playtime had a stronger correlation with the total number of reviews and contributes more to the popularity of a game for games that are more popular. This can possibly be due to how if every game is high in quality, a successful game has to have more content and be larger in scale for it to stand out against its competition and succeed. On the other hand, price had a stronger correlation with the total number of reviews and contributes more to the popularity of a game for games that are less popular. This can possibly be due to how if a game is lacking in terms of content and quality, as a lot of less popular games are, the price would be the biggest factor when deciding which game players will purchase and which games will succeed, whereas if a game had more content and higher in quality, as a lot of more popular games are, any reasonable enough price would be suitable. Finally, rating and the number of languages had a constant correlation with the total number of reviews and the popularity if a game does not affect how much they contribute to the popularity of a game. This can be due to how the quality and accessibility of a game are useful factors in creating a succesful game, regardless how popular the game is.

We can confirm this by training linear regression models and see if they fit the data. The models are trained in groups of 20, with random_state ranging from 0 to 19, and the best models are picked by hand.

In [113]:

We can better see the accuracy of the linear regression model by adjusting the axes.

In [114]:

According to the kdeplot, this linear regression model does somewhat accurately represent the data as the shape of the fitted value is similar to the shape of the actual value. However, the linear regression model does become less accurate for games with a higher number of total reviews, as seen by both the kdeplot and scatterplot. The number of languages had the largest coefficient and contributes the most to the popularity of a game, followed by average playtime, price and lastly rating.

In [115]:

According to the kdeplot, this linear regression model does somewhat accurately represent the data as the shape of the fitted value is similar to the shape of the actual value. Interestingly, the coefficient of price is negative, compared to the other coefficients that are positive. This makes sense, as the lower the price, the more popular and successful the game would be. The average playtime had the largest magnitude of coefficient and contributes the most to the popularity of a game, followed by number of languages, rating, and lastly price.

In [116]:

According to the kdeplot, this linear regression model does not accurately represent the data as the shape of the fitted value is not similar to the shape of the actual value. However, according to the scatterplot, the linear regression model does follow the overall trend of the data points, thus it can still be used for analysis. The number of languages had the largest coefficient and contributes the most to the popularity of a game, followed by price, rating and lastly average playtime.

In [117]:

According to the kdeplot, this linear regression model does not accurately represent the data as the shape of the fitted value is not similar to the shape of the actual value. However, according to the scatterplot, the linear regression model does follow the overall trend of the data points, thus it can still be used for analysis. The price had the largest coefficient and contributes the most to the popularity of a game, followed by number of languages, average playtime and lastly rating.

In [118]:

We can better see the accuracy of the linear regression model by adjusting the axes.

In [119]:

According to the kdeplot, this linear regression model does not accurately represent the data as the shape of the fitted value is not similar to the shape of the actual value. The scatterplot also shows that the predicted values from the linear regression model does not follow the trend of the actual values very well. Therefore, we can conclude that the 4 variables will have a weaker correlation with the number of total reviews and a lesser effect on the success of a game. Therefore, we cannot use this linear regression model for analysis.

We can see that the linear regression plots do in fact follow the trends of the correlation coefficients.

The "Lowest" bin has little to none correlation in all 4 variables. Average playtime had a decreasing trend in how it affects the success of the game as popularity decreases, having the highest coefficient out of all of the 4 variables in the "Highest" bin, before having the steepest constant decrease in relative effect on success. Price had an increasing trend in how it affects the success of the game as popularity decreases, having the highest coefficient out of all of the 4 variables in the "Low" bin, after having the steepest constant increase in relative effect on success. Rating and the number of languages did not have any obvious trend in how it affects the success of the game as popularity changes, both having around the same relative effect on success for all 3 bins.

To summarise, the main factor of the success of an indie game is the quality of the game, where in order to be one of the more popular games, indie games would have to have as polished and high-quality as possible. This is shown by how the number of languages of a game and the rating of a game, two general gauges of how quality is an indie game, would have a similar amount of effect on the success of a game regardless of how popular the game is, showing that the quality of an indie game contributes the most to its success, regardless of its popularity. Other factors include the length of the game and how much content it has, which contributes the most to the success of very popular games, as well as the price of the game, which contributes the most to the success of less popular games.

Q4: What are factors that indie game developers have to consider when developing an indie game?

Firstly, we can analyse if developers view their game as just a passion project, or an actual legitimate game. We can do this by plotting the proportion of itch.io games that are free and paid.

In [120]:

Majority of the itch.io games are free, thus we can conclude that indie game development is still widely seen as a hobby, rather than an actual career path to make money.

We can also plot the proportion of itch.io games by length.

In [121]:

Most of the itch.io games are very short, with a majority of games only lasting a few minutes, thus we can conclude that most itch.io games are made as short passion projects, either to improve their skills or as a hobby, rather than actual long games with a lot of content.

Next, we can compare the most popular genres and tags of itch.io games to find out what types of itch.io games are being produced.

Firstly, we need to find the number of itch.io games that are in each genre.

In [122]:

Many of the genres that were popular among Steam indie games were also popular among itch.io games, such as the "Adventure", "Puzzle", "Action" and "Platformer" genres. This confirms that these genres are truly popular to develop among all indie games, and not just Steam indie games or itch.io indie games.

We can also find the number of itch.io games that have each tag.

In [123]:

Similar to the genres, many of the tags that were popular among Steam indie games were also popular among itch.io games, such as the "2D", "Pixel Art", "Singleplayer", "Short", "3D" and "Cute" tags. This confirms that these tags are truly popular to develop among all indie games, and not just Steam indie games or itch.io indie games.

Therefore, we can conclude that the patterns in the types of indie games produced we observed in the Steam games is also consistent in itch.io games, thus the decisions and solutions that indie game developers use to overcome the limitations in resources and manpower on the commercial scale are also applicable in general indie game development, regardless if it is to create a legitimate game or if it is just as a hobby or passion project.

We can analyse how useful are external tools and software in indie game development by plotting the proportion of itch.io games by number of tools and software used. These tools and software are software that can help in the development of indie games in many different areas, such as programming, graphics and sound.

In [124]:

As the number of tools and software increases, the proportion of itch.io games decreases. However, the proportion of itch.io games that used 1 tool was almost completely equal to the proportion of itch.io games that used no tools, being only 0.1% lesser. There is also a greater proportion of itch.io games that used at least 1 tool compared to itch.io games that used no tools at all. This shows that the use of external tools and software in indie game development is useful among itch.io games and is external tools and software are used quite frequently in indie game development.

We can also analyse the most popular tools and software being used in indie game development by finding the top 20 tools and softwares among itch.io games.

In [125]:

Unity was the most popular tool by far, having a far greater number of itch.io games than Bitsy, the tool with the second most games. This makes sense, as Unity is widely considered to be the best game development software for beginners due to how easy it is to use, thus it will be perfect for inexperienced indie game developers.

Out of the top 20 tools and softwares, 12 were game engines and were for programming, 6 were tools that can be used to make art assets and were for graphics, and 2 were for creating audio and sound effects. The tools and software for game engines and programming are Unity, Bitsy, RenPy, GameMaker: Studio, Twine, Construct, PICO-8, Godot, RPG Maker, Unreal Engine, OpenFL and PuzzleScript. The tools and software for graphics and art are Adobe Photoshop, Aseprite, Blender, GIMP, Clip Studio Paint and Paint.net. Lastly, the tools and software for audio and sound effects are Audacity and FL Studio.

Therefore, we can conclude that tools and softwares are very useful in indie game development in many different areas, such as programming, art and audio. This is especially true if the software is easy for indie game developers to use, such as Unity. As a result, external tools and softwares are a very integral part of indie development and many indie games use the to combat the limitations from a lack of resources and manpower.

In summary, one of the factors that indie game developers have to consider is what is the reason they are developing an indie game. If it was for a game marketplace like Steam, then it would most likely be a legitimate game and indie game development would be more of a career than a hobby. However, if it is for a more casual place like itch.io, then it can just be a short passion project, either to improve skills or as a hobby, rather than actual a long game with a lot of content. Indie game development would also be seen as a hobby, rather than an actual career path to make money. However, regardless of what is the reason, the type of game they choose to make and the design choices they make are also very important factors. The decisions and solutions that indie game developers use to overcome the limitations in resources and manpower on are applicable regardless if it is to create a legitimate game that generates income or if it is just as a hobby or passion project. Lastly, choosing what external tools and softwares to use is an important factor. These tools and softwares are very useful in indie game development in many different areas, such as programming, art and audio. This is especially true if the software is easy for indie game developers to use, such as Unity. As a result, external tools and softwares are a very integral part of indie development and many indie games use the to combat the limitations from a lack of resources and manpower.

Conclusion

In conclusion, the popularity of indie games among players has been on the rise and is catching up to the popularity of non-indie games, where more and more players are starting to play indie games as time goes on. This is especially true for less popular games, where the rise in popularity is more rapid. As a result, the demand of indie games has risen exponentially, with many more indie games getting released in the present than in the past. This increasing trend started at around 2008, where the number of indie games released started to grow exponentially. While non-indie games might still have more players than indie games, indie games are still comparable in scale and popularity to non-indie games, regardless of the level of popularity.

This could be due to how indie game developers deal with limitations in resources and manpower. Despite the many restrictions and limitations indie game developers face when developing their games due to a lack of resources and manpower, indie game developers are able to deal with these limitations in resources and manpower in ingenious ways. As a result, even with the difference in resources and manpower, indie games can still be of the same quality and non-indie games, perhaps even of a higher quality. Indie games can be just as enjoyable and positively received as non-indie games.

There are also many factors that contribute to the success of an indie game. The main factor of the success of an indie game is the quality of the game, where in order to be one of the more popular games, indie games would have to have as polished and high-quality as possible. Other factors include the length of the game and how much content it has, which contributes the most to the success of very popular games, as well as the price of the game, which contributes the most to the success of less popular games.

In order for indie game developers to achieve this success, they have to consider many factors when developing an indie game. One of the factors that indie game developers have to consider is what is the reason they are developing an indie game. If it was for a game marketplace like Steam, then it would most likely be a legitimate game and indie game development would be more of a career than a hobby. However, if it is for a more casual place like itch.io, then it can just be a short passion project, either to improve skills or as a hobby, rather than actual a long game with a lot of content. Indie game development would also be seen as a hobby, rather than an actual career path to make money. However, regardless of what is the reason, the type of game they choose to make and the design choices they make are also very important factors. The decisions and solutions that indie game developers use to overcome the limitations in resources and manpower on are applicable regardless if it is to create a legitimate game that generates income or if it is just as a hobby or passion project. Lastly, choosing what external tools and softwares to use is an important factor. These tools and softwares are very useful in indie game development in many different areas, such as programming, art and audio. This is especially true if the software is easy for indie game developers to use, such as Unity. As a result, external tools and softwares are a very integral part of indie development and many indie games use the to combat the limitations from a lack of resources and manpower.

Recommendations or Further Works

Other than Steam games and itch.io games, I could also try to analyse the trends of games submitted to game jams. Game jams are online competitions where contestants are given a short period of time, usually weeks, days or even hours, to create a fully fledged and functioning game. Since contestants have to develop a full game in such a short period of time, there might be more ingenious ways that indie developers overcome the even lesser amount of resources they have available.

I could also try finding out the relationships of different tags and genres. Perhaps there could be some combinations of tags or genres that would be much more effective than others. By analysing the combinations of different tags and genres, we can have an even more in-depth analysis on the types of games that indie game developers produce, and would be able to have more meaningful results.

References